Skip to content

deepseek-ai/deepseek-r1-distill-qwen-32b

Model Information

deepseek-ai/deepseek-r1-distill-qwen-32b is a distilled, instruction-tuned large language model based on Qwen-32B, released by DeepSeek. It is optimized for reasoning, code, and math while offering faster inference and lower memory usage.

  • Model Developer: DeepSeek AI
  • Model Release Date: June 2024
  • Supported Languages: English and Chinese

Model Architecture

deepseek-ai/deepseek-r1-distill-qwen-32b uses a decoder-only transformer architecture distilled from Qwen-32B. It maintains high performance while being more resource-efficient.

  • Model Type: Decoder-only transformer
  • Base Model: Qwen-32B
  • Distilled By: DeepSeek AI
  • Parameters: Approximately 32B
  • Context Length: 32K tokens
  • Training:
    • Distillation of Qwen-32B using instruction-tuning datasets
    • Fine-tuned for multilingual and reasoning tasks
  • Tokenizer: Compatible with Qwen tokenizer
  • Key Strengths:
    • Instruction following
    • Math and code generation
    • Balanced performance and compute efficiency

Benchmark Scores

Category Benchmark Shots Metric Distill-Qwen-32B
General MMLU (dev, 5-shot) 5 Accuracy 73.4
Reasoning CMMLU (dev, 5-shot) 5 Accuracy 63.1
Math GSM8K (dev, 8-shot) 8 Accuracy 83.6
Code HumanEval 0 Pass@1 80.7

The model offers a strong tradeoff between performance and cost, especially for math, reasoning, and code tasks.


References