Math

AIME: 2026 AI Leaderboard

The American Invitational Math Exam, used as a rolling frontier-math benchmark.

What it tests

AIME is a 15-question annual math competition for top US high school students, with integer answers 0-999. LLM labs now run fresh AIME problems each year as a contamination-resistant reasoning benchmark because the questions are published publicly only after the competition.

How it is scored

Accuracy over 15 problems, reported per year (AIME 2024, AIME 2025, etc.). Each year reruns against the current model lineup.

Why it matters

AIME is the cleanest year-over-year math-reasoning signal. A model that scores 99%+ on AIME 2024 but 60% on AIME 2026 is almost certainly training-data-contaminated; fresh-year scores are the honest test.

Leaderboard (7 models)

Sorted by AIMEscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#ModelTierAIME score
1Claude (Anthropic)
Claude Opus 4.7 (4.6 baseline scores shown; 4.7 announced 13% coding lift, 3x production task completion)
A99.8%
2Kimi K2.5 (Moonshot)
Kimi K2.5 (1T/32B active MoE)
A91.2%
3Gemma 4 (Google)
Gemma 4 31B
A89.2%
4Qwen (Alibaba)
Qwen3.5-397B MoE
A87%
5MiniMax M2 / M2.5
MiniMax M2.5 (230B/10B active MoE)
A85.3%
6Nemotron (Nvidia)
Nemotron 3 Ultra (253B)
B84.5%
7ChatGPT
GPT-5.4
A83.3%

About AIME

Creator
Mathematical Association of America
Unit
% (max 100)

Other benchmarks