ARC-AGI: 2026 AI Leaderboard
Abstract visual reasoning puzzles designed to stay hard for LLMs.
What it tests
ARC-AGI (Abstract Reasoning Corpus) is a set of grid-based visual puzzles where a model sees a few input/output example grids and must infer the transformation rule. Each puzzle is designed to require abstraction that does not exist in the training data.
How it is scored
Accuracy on held-out puzzles. A 50% score is considered a major frontier milestone. ARC-AGI-2 is the harder current version; a $1M prize was offered for solving it.
Why it matters
ARC-AGI is the benchmark designed to resist scaling. Strong performance suggests actual abstract-reasoning capability rather than pattern completion. Useful for identifying which models are 'thinking' vs 'searching training data'.
Leaderboard (2 models)
Sorted by ARC-AGIscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.
| # | Model | Tier | ARC-AGI score | Variant | Overall |
|---|---|---|---|---|---|
| 1 | Claude (Anthropic) Claude Fable 5 (launched 2026-06-09) is now the flagship -- Anthropic positions it as its most capable public model on SWE, knowledge work, and vision, but published no standalone numeric benchmark table at launch; legacy Opus-line reasoning-suite scores shown below as baseline, third-party Fable 5 verification pending | A | 75.2% | ARC-AGI | 8.5/10 |
| 2 | ChatGPT GPT-5.5 (launched 2026-04-23; scores below are the GPT-5.4 baseline -- GPT-5.5 launch benchmarks per OpenAI are logged in Known Issues, pending third-party verification) | A | 73.3% | ARC-AGI | 8.8/10 |
About ARC-AGI
- Creator
- Francois Chollet, 2019 (v2 2024)
- Unit
- % (max 100)
- Official source
- https://arcprize.org/