Skip to content

openai/gpt-oss-20b

Model Information

openai/gpt-oss-20b is a mid-sized, open-weight model in OpenAI’s gpt-oss family, created to balance reasoning strength, adaptability, and deployment efficiency. It is engineered to run smoothly on commonly available hardware while still supporting advanced features like chain-of-thought prompting, configurable reasoning levels, and native tool-use integration.

This model is particularly well-suited for developers and researchers seeking a powerful yet cost-efficient foundation for production workloads, fine-tuning, and experimentation without requiring large-scale infrastructure.

  • Model Developer: OpenAI
  • Model Release Date: 2025
  • Supported Languages: English (primary), with generalization across multiple languages

Model Architecture

The openai/gpt-oss-20b is structured as a sparse Mixture-of-Experts (MoE) Transformer, optimized to deliver strong reasoning ability without the heavy infrastructure demands of very large models. By activating only a small number of experts per token, it balances efficiency and adaptability, making it well-suited for research, prototyping, and production in environments with limited GPU capacity.

  • Type: Decoder-only Transformer (MoE)
  • Total Parameters: 20B (~2.5B active per token)
  • Layers: 24, with 64 experts per layer (2 active per token)
  • Context Length: Up to 64K tokens
  • Attention: Multi-Head Self-Attention with Rotary Position Embeddings (RoPE)
  • Quantization: MXFP4 post-training, deployable on 80 GB GPUs (e.g., NVIDIA A100/H100, AMD MI300X)
  • Training Format: Harmony response format (supports structured, reliable outputs)
  • Reasoning Levels: Adjustable — low, medium, high
  • Core Capabilities: Function calling, tool integration, Python execution, structured outputs
  • Fine-tuning: Supported on a single 80 GB GPU node
  • License: Apache 2.0

Benchmark Scores

Category Benchmark Metric (Low / Med / High) gpt-oss-20b
General Knowledge MMLU (no tools) Accuracy 75.2 / 80.5 / 84.1
Competition Math AIME 2024 (no tools) Accuracy 41.8 / 63.4 / 78.9
Competition Math AIME 2024 (with tools) Accuracy 59.7 / 77.5 / 88.3
Competition Math AIME 2025 (no tools) Accuracy 39.1 / 62.0 / 75.4
Competition Math AIME 2025 (with tools) Accuracy 58.2 / 80.3 / 89.5
Science Reasoning GPQA Diamond (no tools) Accuracy 55.9 / 61.2 / 68.7
Science Reasoning GPQA Diamond (with tools) Accuracy 57.0 / 62.1 / 70.1
Programming Codeforces (no tools) Elo 1422 / 1820 / 2050
Programming Codeforces (with tools) Elo 1489 / 1930 / 2167
Health Domain HealthBench Accuracy 47.3 / 50.1 / 52.9

The model balances efficiency and reasoning power, showing strong gains when combined with tool use across math, science, and programming domains.


References