meta-llama/Llama-4-Maverick-17B-128E-Instruct¶
Model Information¶
meta-llama/Llama-4-Maverick-17B-128E-Instruct
is a compact, instruction-tuned model developed by Meta as part of the LLaMA 4 "Maverick" series. This 17B parameter model, with 128 experts, is designed to deliver high performance at a fraction of the inference cost of larger models. It demonstrates strong generalization across multilingual, coding, and reasoning tasks, while being efficient enough for scalable deployment.
- Model Developer: Meta
- Model Release Date: July 2024
- Supported Languages: English (primary), with broad multilingual generalization including French, Spanish, German, Portuguese, Japanese, Korean, and Hindi
Model Architecture¶
meta-llama/Llama-4-Maverick-17B-128E-Instruct
uses a Mixture-of-Experts (MoE) architecture, enabling efficient compute utilization with high performance.
Key Features:
- Model Type: Decoder-only Transformer
- Parameter Count: 17B active parameters, 128 total experts
- MoE Routing: Sparse activation (2 experts per token)
- Context Length: Up to 32,000 tokens
- Training Techniques:
- Instruction tuning on curated multi-task datasets
- Reinforcement Learning from Human Feedback (RLHF)
- Safety alignment and toxicity mitigation
- Tokenizer: Extended version of LLaMA 3 tokenizer
The Maverick architecture is designed to combine the benefits of Mixture-of-Experts scalability with general-purpose reasoning, making it ideal for serving tasks in constrained compute environments.
Benchmark Scores¶
Category | Benchmark | Shots | Metric | LLaMA 4 Maverick 17B-128E Instruct |
---|---|---|---|---|
General | MMLU (CoT) | 0 | Acc. (avg) | 86.5 |
MMLU Pro (CoT) | 5 | Acc. (avg) | 58.6 | |
Steerability | IFEval | – | – | 91.3 |
Reasoning | GPQA Diamond (CoT) | 0 | Accuracy | 45.3 |
Code | HumanEval | 0 | Pass@1 | 83.7 |
MBPP EvalPlus (base) | 0 | Pass@1 | 84.1 | |
Math | MATH (CoT) | 0 | Sympy Score | 58.3 |
Tool Use | BFCL v2 | 0 | AST Macro Avg. | 79.4 |
Multilingual | MGSM | 0 | EM (exact match) | 76.8 |
LLaMA 4 Maverick 17B-128E sets a new benchmark for compute-efficient instruction-following models, offering near-flagship quality at smaller scale.