MMLU: 2026 AI Leaderboard
The 57-subject knowledge test that became the default LLM benchmark.
What it tests
MMLU (Massive Multitask Language Understanding) is a 14,000-question multiple-choice exam spanning 57 subjects from elementary math to professional law. It measures how much a language model actually knows, not how well it reasons.
How it is scored
Models answer four-choice questions in a zero-shot or few-shot setting. The reported score is average accuracy across all subjects. Scores above 85% are considered strong; humans average roughly 89% on this test.
Why it matters
MMLU is the most widely-reported LLM benchmark, which makes it the easiest point of apples-to-apples comparison across vendors. Its weakness is saturation -- frontier models now cluster in the upper 80s and 90s, so small differences are statistical noise. Use it to rule out weak models, not to pick a winner among strong ones.
Leaderboard (10 models)
Sorted by MMLUscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.
| # | Model | Tier | MMLU score | Variant | Overall |
|---|---|---|---|---|---|
| 1 | Claude (Anthropic) Claude Opus 4.7 (4.6 baseline scores shown; 4.7 announced 13% coding lift, 3x production task completion) | A | 91.3% | MMLU | 8.5/10 |
| 2 | ChatGPT GPT-5.4 | A | 91% | MMLU | 8.8/10 |
| 3 | DeepSeek DeepSeek V3.2 | A | 90.8% | MMLU | 8.0/10 |
| 4 | Gemini (Google) Gemini 3.1 Ultra | A | 90.5% | MMLU | 8.3/10 |
| 5 | Muse Spark (Meta) Muse Spark | A | 89% | MMLU | 8.8/10 |
| 6 | Grok Grok 4.20 | B | 88.5% | MMLU | 7.5/10 |
| 7 | Nemotron (Nvidia) Nemotron 3 Ultra (253B) | B | 88.4% | MMLU (Llama-Nemotron 70B) | 7.8/10 |
| 8 | Mistral AI Mistral Large 3 / Small 4 | B | 86% | MMLU | 7.5/10 |
| 9 | Gemma 4 (Google) Gemma 4 31B | A | 83% | MMLU | 8.3/10 |
| 10 | Falcon (TII) Falcon 3 10B | B | 73.1% | MMLU | 7.1/10 |
About MMLU
- Creator
- Hendrycks et al., 2020 (UC Berkeley)
- Unit
- % (max 100)
- Official source
- https://arxiv.org/abs/2009.03300