openai/gpt-oss-20b¶
Model Information¶
openai/gpt-oss-20b
is a mid-sized, open-weight model in OpenAI’s gpt-oss family, created to balance reasoning strength, adaptability, and deployment efficiency. It is engineered to run smoothly on commonly available hardware while still supporting advanced features like chain-of-thought prompting, configurable reasoning levels, and native tool-use integration.
This model is particularly well-suited for developers and researchers seeking a powerful yet cost-efficient foundation for production workloads, fine-tuning, and experimentation without requiring large-scale infrastructure.
- Model Developer: OpenAI
- Model Release Date: 2025
- Supported Languages: English (primary), with generalization across multiple languages
Model Architecture¶
The openai/gpt-oss-20b
is structured as a sparse Mixture-of-Experts (MoE) Transformer, optimized to deliver strong reasoning ability without the heavy infrastructure demands of very large models. By activating only a small number of experts per token, it balances efficiency and adaptability, making it well-suited for research, prototyping, and production in environments with limited GPU capacity.
- Type: Decoder-only Transformer (MoE)
- Total Parameters: 20B (~2.5B active per token)
- Layers: 24, with 64 experts per layer (2 active per token)
- Context Length: Up to 64K tokens
- Attention: Multi-Head Self-Attention with Rotary Position Embeddings (RoPE)
- Quantization: MXFP4 post-training, deployable on 80 GB GPUs (e.g., NVIDIA A100/H100, AMD MI300X)
- Training Format: Harmony response format (supports structured, reliable outputs)
- Reasoning Levels: Adjustable — low, medium, high
- Core Capabilities: Function calling, tool integration, Python execution, structured outputs
- Fine-tuning: Supported on a single 80 GB GPU node
- License: Apache 2.0
Benchmark Scores¶
Category | Benchmark | Metric (Low / Med / High) | gpt-oss-20b |
---|---|---|---|
General Knowledge | MMLU (no tools) | Accuracy | 75.2 / 80.5 / 84.1 |
Competition Math | AIME 2024 (no tools) | Accuracy | 41.8 / 63.4 / 78.9 |
Competition Math | AIME 2024 (with tools) | Accuracy | 59.7 / 77.5 / 88.3 |
Competition Math | AIME 2025 (no tools) | Accuracy | 39.1 / 62.0 / 75.4 |
Competition Math | AIME 2025 (with tools) | Accuracy | 58.2 / 80.3 / 89.5 |
Science Reasoning | GPQA Diamond (no tools) | Accuracy | 55.9 / 61.2 / 68.7 |
Science Reasoning | GPQA Diamond (with tools) | Accuracy | 57.0 / 62.1 / 70.1 |
Programming | Codeforces (no tools) | Elo | 1422 / 1820 / 2050 |
Programming | Codeforces (with tools) | Elo | 1489 / 1930 / 2167 |
Health Domain | HealthBench | Accuracy | 47.3 / 50.1 / 52.9 |
The model balances efficiency and reasoning power, showing strong gains when combined with tool use across math, science, and programming domains.