openai/gpt-oss-120b¶
Model Information¶
openai/gpt-oss-120b
is the larger variant in OpenAI’s open-weight gpt-oss series, designed for reasoning-intensive, agentic, and production-scale applications. It is optimized to run on a single 80 GB GPU through a Mixture-of-Experts (MoE) architecture and provides developers with access to chain-of-thought reasoning, configurable reasoning levels, and native tool-use capabilities.
- Model Developer: OpenAI
- Model Release Date: August 2025
- Supported Languages: Primarily English, with STEM and general knowledge coverage
Model Architecture¶
openai/gpt-oss-120b
model is implemented as a sparse Mixture-of-Experts (MoE) Transformer. Only a subset of experts are active for each token, reducing compute cost while maintaining high reasoning performance.
- Type: Decoder-only Transformer (MoE)
- Total Parameters: 117B (~5.1B active per token)
- Layers: 36, with 128 experts per layer (4 active)
- Context Length: Up to 128K tokens
- Attention: Multi-Head Self-Attention with Rotary Position Embeddings (RoPE)
- Quantization: MXFP4 (post-training), optimized for 80 GB GPUs (e.g., NVIDIA H100, AMD MI300X)
- Training Format: Harmony response format (required for correct outputs)
- Reasoning Levels: Configurable — low, medium, high
- Core Capabilities: Function calling, web browsing, Python execution, structured outputs
- Fine-tuning: Supported on a single H100 node
- License: Apache 2.0
Benchmark Scores¶
Category | Benchmark | Metric (Low / Med / High) | gpt-oss-120b |
---|---|---|---|
General Knowledge | MMLU (no tools) | Accuracy | 85.9 / 88.0 / 90.0 |
Competition Math | AIME 2024 (no tools) | Accuracy | 56.3 / 80.4 / 95.8 |
Competition Math | AIME 2024 (with tools) | Accuracy | 75.4 / 87.9 / 96.6 |
Competition Math | AIME 2025 (no tools) | Accuracy | 50.4 / 80.0 / 92.5 |
Competition Math | AIME 2025 (with tools) | Accuracy | 72.9 / 91.6 / 97.9 |
Science Reasoning | GPQA Diamond (no tools) | Accuracy | 67.1 / 73.1 / 80.1 |
Science Reasoning | GPQA Diamond (with tools) | Accuracy | 68.1 / 73.5 / 80.9 |
Programming | Codeforces (no tools) | Elo | 1595 / 2205 / 2463 |
Programming | Codeforces (with tools) | Elo | 1653 / 2365 / 2622 |
Health Domain | HealthBench | Accuracy | 53.0 / 55.9 / 57.6 |
The model demonstrates strong performance across reasoning, math, science, and programming tasks. Tool use further improves results, bringing performance near parity with proprietary models.