Knowledge

MMLU-Pro: 2026 AI Leaderboard

MMLU's harder successor: 10 answer choices and more reasoning.

What it tests

MMLU-Pro is a successor to MMLU that expands each question to 10 answer choices (up from 4) and rewrites prompts to require multi-step reasoning rather than pure recall.

How it is scored

Same accuracy metric as MMLU but on the harder reformulated question bank. Frontier models score roughly 10-20 points lower here than on base MMLU.

Why it matters

Worth watching because base MMLU has saturated. MMLU-Pro is less saturated and still has headroom, making it a better discriminator for top-tier models in 2026.

Leaderboard (7 models)

Sorted by MMLU-Proscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#ModelTierMMLU-Pro score
1DeepSeek
DeepSeek V3.2
A85%
2Kimi K2.5 (Moonshot)
Kimi K2.5 (1T/32B active MoE)
A84.8%
3Qwen (Alibaba)
Qwen3.5-397B MoE
A83.5%
4MiniMax M2 / M2.5
MiniMax M2.5 (230B/10B active MoE)
A82.1%
5GLM / Z.ai (Zhipu AI)
GLM-5.1 (744B MoE / 40B active)
A81.2%
6Llama 4 (Meta)
Llama 4 Maverick (17B/400B MoE)
B80.5%
7Nemotron (Nvidia)
Nemotron 3 Ultra (253B)
B79.8%

About MMLU-Pro

Creator
TIGER-Lab, 2024
Unit
% (max 100)

Other benchmarks