AIME: 2026 AI Leaderboard
The American Invitational Math Exam, used as a rolling frontier-math benchmark.
What it tests
AIME is a 15-question annual math competition for top US high school students, with integer answers 0-999. LLM labs now run fresh AIME problems each year as a contamination-resistant reasoning benchmark because the questions are published publicly only after the competition.
How it is scored
Accuracy over 15 problems, reported per year (AIME 2024, AIME 2025, etc.). Each year reruns against the current model lineup.
Why it matters
AIME is the cleanest year-over-year math-reasoning signal. A model that scores 99%+ on AIME 2024 but 60% on AIME 2026 is almost certainly training-data-contaminated; fresh-year scores are the honest test.
Leaderboard (7 models)
Sorted by AIMEscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.
| # | Model | Tier | AIME score | Variant | Overall |
|---|---|---|---|---|---|
| 1 | Claude (Anthropic) Claude Opus 4.7 (4.6 baseline scores shown; 4.7 announced 13% coding lift, 3x production task completion) | A | 99.8% | AIME 2024 | 8.5/10 |
| 2 | Kimi K2.5 (Moonshot) Kimi K2.5 (1T/32B active MoE) | A | 91.2% | AIME 2025 | 8.1/10 |
| 3 | Gemma 4 (Google) Gemma 4 31B | A | 89.2% | AIME 2026 | 8.3/10 |
| 4 | Qwen (Alibaba) Qwen3.5-397B MoE | A | 87% | AIME 2025 | 8.8/10 |
| 5 | MiniMax M2 / M2.5 MiniMax M2.5 (230B/10B active MoE) | A | 85.3% | AIME 2025 | 8.4/10 |
| 6 | Nemotron (Nvidia) Nemotron 3 Ultra (253B) | B | 84.5% | AIME 2025 | 7.8/10 |
| 7 | ChatGPT GPT-5.4 | A | 83.3% | AIME 2024 | 8.8/10 |
About AIME
- Creator
- Mathematical Association of America
- Unit
- % (max 100)
- Official source
- https://www.maa.org/math-competitions/aime