deepseek-ai/deepseek-r1-distill-qwen-32b¶
Model Information¶
deepseek-ai/deepseek-r1-distill-qwen-32b
is a distilled, instruction-tuned large language model based on Qwen-32B, released by DeepSeek. It is optimized for reasoning, code, and math while offering faster inference and lower memory usage.
- Model Developer: DeepSeek AI
- Model Release Date: June 2024
- Supported Languages: English and Chinese
Model Architecture¶
deepseek-ai/deepseek-r1-distill-qwen-32b
uses a decoder-only transformer architecture distilled from Qwen-32B. It maintains high performance while being more resource-efficient.
- Model Type: Decoder-only transformer
- Base Model: Qwen-32B
- Distilled By: DeepSeek AI
- Parameters: Approximately 32B
- Context Length: 32K tokens
- Training:
- Distillation of Qwen-32B using instruction-tuning datasets
- Fine-tuned for multilingual and reasoning tasks
- Tokenizer: Compatible with Qwen tokenizer
- Key Strengths:
- Instruction following
- Math and code generation
- Balanced performance and compute efficiency
Benchmark Scores¶
Category | Benchmark | Shots | Metric | Distill-Qwen-32B |
---|---|---|---|---|
General | MMLU (dev, 5-shot) | 5 | Accuracy | 73.4 |
Reasoning | CMMLU (dev, 5-shot) | 5 | Accuracy | 63.1 |
Math | GSM8K (dev, 8-shot) | 8 | Accuracy | 83.6 |
Code | HumanEval | 0 | Pass@1 | 80.7 |
The model offers a strong tradeoff between performance and cost, especially for math, reasoning, and code tasks.