deepseek-ai/deepseek-r1-distill-qwen-32b¶

Model Information¶

deepseek-ai/deepseek-r1-distill-qwen-32b is a distilled, instruction-tuned large language model based on Qwen-32B, released by DeepSeek. It is optimized for reasoning, code, and math while offering faster inference and lower memory usage.

Model Developer: DeepSeek AI
Model Release Date: June 2024
Supported Languages: English and Chinese

Model Architecture¶

deepseek-ai/deepseek-r1-distill-qwen-32b uses a decoder-only transformer architecture distilled from Qwen-32B. It maintains high performance while being more resource-efficient.

Model Type: Decoder-only transformer
Base Model: Qwen-32B
Distilled By: DeepSeek AI
Parameters: Approximately 32B
Context Length: 32K tokens
Training:
- Distillation of Qwen-32B using instruction-tuning datasets
- Fine-tuned for multilingual and reasoning tasks
Tokenizer: Compatible with Qwen tokenizer
Key Strengths:
- Instruction following
- Math and code generation
- Balanced performance and compute efficiency

Benchmark Scores¶

Category	Benchmark	Shots	Metric	Distill-Qwen-32B
General	MMLU (dev, 5-shot)	5	Accuracy	73.4
Reasoning	CMMLU (dev, 5-shot)	5	Accuracy	63.1
Math	GSM8K (dev, 8-shot)	8	Accuracy	83.6
Code	HumanEval	0	Pass@1	80.7

The model offers a strong tradeoff between performance and cost, especially for math, reasoning, and code tasks.

deepseek-ai/deepseek-r1-distill-qwen-32b¶

Model Information¶

Model Architecture¶

Benchmark Scores¶

References¶