Math

AIME: 2026 AI Leaderboard

The American Invitational Math Exam, used as a rolling frontier-math benchmark.

What it tests

AIME is a 15-question annual math competition for top US high school students, with integer answers 0-999. LLM labs now run fresh AIME problems each year as a contamination-resistant reasoning benchmark because the questions are published publicly only after the competition.

How it is scored

Accuracy over 15 problems, reported per year (AIME 2024, AIME 2025, etc.). Each year reruns against the current model lineup.

Why it matters

AIME is the cleanest year-over-year math-reasoning signal. A model that scores 99%+ on AIME 2024 but 60% on AIME 2026 is almost certainly training-data-contaminated; fresh-year scores are the honest test.

Leaderboard (7 models)

Sorted by AIMEscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#	Model	Tier	AIME score	Variant	Overall
1	Claude (Anthropic) Claude Opus 4.7 (4.6 baseline scores shown; 4.7 announced 13% coding lift, 3x production task completion)	A	99.8%	AIME 2024	8.5/10
2	Kimi K2.5 (Moonshot) Kimi K2.5 (1T/32B active MoE)	A	91.2%	AIME 2025	8.1/10
3	Gemma 4 (Google) Gemma 4 31B	A	89.2%	AIME 2026	8.3/10
4	Qwen (Alibaba) Qwen3.5-397B MoE	A	87%	AIME 2025	8.8/10
5	MiniMax M2 / M2.5 MiniMax M2.5 (230B/10B active MoE)	A	85.3%	AIME 2025	8.4/10
6	Nemotron (Nvidia) Nemotron 3 Ultra (253B)	B	84.5%	AIME 2025	7.8/10
7	ChatGPT GPT-5.4	A	83.3%	AIME 2024	8.8/10

About AIME

Creator: Mathematical Association of America
Unit: % (max 100)
Official source: https://www.maa.org/math-competitions/aime

Other benchmarks