Knowledge

MMLU-Pro: 2026 AI Leaderboard

MMLU's harder successor: 10 answer choices and more reasoning.

What it tests

MMLU-Pro is a successor to MMLU that expands each question to 10 answer choices (up from 4) and rewrites prompts to require multi-step reasoning rather than pure recall.

How it is scored

Same accuracy metric as MMLU but on the harder reformulated question bank. Frontier models score roughly 10-20 points lower here than on base MMLU.

Why it matters

Worth watching because base MMLU has saturated. MMLU-Pro is less saturated and still has headroom, making it a better discriminator for top-tier models in 2026.

Leaderboard (5 models)

Sorted by MMLU-Proscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#	Model	Tier	MMLU-Pro score	Variant	Overall
1	DeepSeek DeepSeek V4-Pro (SWE-bench + Arena Elo third-party verified post-launch; knowledge rows are V3.x baseline pending V4 figures)	A	85%	MMLU-Pro	8.0/10
2	Qwen (Alibaba) Qwen3.5-397B MoE	A	83.5%	MMLU-Pro	8.8/10
3	GLM / Z.ai (Zhipu AI) GLM-5.2 (~753B MoE, launched 2026-06-13) -- vendor-published; third-party verification still settling	A	81.2%	MMLU-Pro	8.0/10
4	Llama 4 (Meta) Llama 4 Maverick (17B/400B MoE)	B	80.5%	MMLU-Pro	7.9/10
5	Nemotron (Nvidia) Llama-Nemotron Ultra 253B (prior gen -- Nemotron 3 Ultra 550B third-party scores pending)	B	79.8%	MMLU-Pro	7.8/10

About MMLU-Pro

Creator: TIGER-Lab, 2024
Unit: % (max 100)
Official source: https://arxiv.org/abs/2406.01574

Other benchmarks