Kimi K2.5 (Moonshot)
A Tier · 8.1/10
Moonshot's 1T-parameter MoE open-weights flagship -- best open-source agentic coder, rivals Claude Opus 4.5
Score Breakdown
Benchmark Scores
Benchmarks for Kimi K2.5 (1T/32B active MoE)
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU-Pro | Harder multi-subject reasoning | 84.8% | |
| GPQA Diamond | Graduate-level science questions | 80.5% | |
| AIME 2025 | 91.2% | ||
| SWE-Bench Verified | 78.5% | ||
| LiveCodeBench | 74.1% |
Last updated: 2026-04-13
The Good and the Bad
What we like
- +Frontier-tier performance -- Elo 1309 on GDPval-AA, behind only OpenAI and Anthropic flagships
- +Beats Claude Opus 4.5 on several coding benchmarks per community testing
- +Unified thinking + non-thinking modes in one model (no need to swap)
- +256K context window handles large codebases for agentic coding
- +Modified MIT license permits commercial use of weights
- +Native tool-use and agentic planning trained in -- not bolted on
What could be better
- −1T parameter model is impractical to self-host without 4+ H100-class GPUs
- −Moonshot is a smaller lab than DeepSeek/Alibaba -- less Western infrastructure support
- −API pricing ($0.60 in / $3.00 out) is higher than DeepSeek V3.2 ($0.28 in / $0.42 out)
- −PRC content filters apply (Tiananmen, Taiwan, etc.)
- −Documentation is heavily Chinese-first -- English docs trail releases
Pricing
Self-hosted (Free)
- ✓Modified MIT license -- commercial use allowed
- ✓Weights on Hugging Face
- ✓Fine-tuning permitted
API (Moonshot / OpenRouter)
- ✓K2.5-Reasoning: $0.60 in / $3.00 out
- ✓256K context
- ✓Blended cost ~$1.07/M
System Requirements
Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.
| Model variant | Min | Max |
|---|---|---|
| Kimi K2.5 (1T total, 32B active MoE)Practically a hosted-only model for most users -- self-hosting requires enterprise hardware | 256 GB unified RAM Mac Studio M3 Ultra (Q2, ~3 tok/s) | 8× H200 141 GB FP8 or 16× H100 (production-grade) |
Known Issues
- Self-hosting K2.5 at usable speed requires $30K+ in enterprise GPU hardware -- realistically this is a hosted-API modelSource: Reddit r/LocalLLaMA, llm-stats.com · 2026-03
- Early K2.5 releases had inconsistent tool-calling when quantized below Q4 -- community fixes landed March 2026Source: Hugging Face discussions · 2026-03
Best for
Agentic coding workflows, tool-use agents, and teams willing to pay hosted-API prices for frontier-tier quality with open-weights licensing protection.
Not for
Solo developers or hobbyists who want to run models locally -- the 1T parameter size makes that impractical. Use Qwen3-Coder-Next or DeepSeek instead for self-hosting.
Our Verdict
Kimi K2.5 is the best open-weights model in the world right now for agentic coding. It legitimately rivals Claude Opus 4.5 and Gemini 3.1 Pro on practical coding tasks while being nominally 'open.' The catch is that the 1T parameter size makes it hosted-only for 99% of users. If you're picking between hosted APIs and you want maximum quality with open-weights safety, Kimi K2.5 is the S-tier pick. If you need a model that actually runs on your hardware, look at Qwen3-Coder-Next or DeepSeek V3.2 instead.
Sources
- Moonshot Kimi K2.5 release (accessed 2026-04-13)
- Artificial Analysis GDPval-AA leaderboard (accessed 2026-04-13)
- llm-stats.com (accessed 2026-04-13)
- OpenRouter pricing (accessed 2026-04-13)
- Reddit r/singularity, r/LocalLLaMA (accessed 2026-04-13)
Alternatives to Kimi K2.5 (Moonshot)
Llama 4 (Meta)
Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview
Mistral AI
European AI lab with open and commercial models that punch well above their size
DeepSeek
Near-frontier reasoning for pennies on the dollar -- the open-source LLM that made Silicon Valley nervous
Gemma 4 (Google)
Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices
Qwen (Alibaba)
Alibaba's open-weights family -- Qwen3.5, Qwen3-Coder-Next, Qwen3-VL, Qwen3-Max. Apache 2.0 flagship sizes.
GLM / Z.ai (Zhipu AI)
Zhipu AI's open-weights family -- GLM-4.6 text flagship and GLM-4.6V multimodal, true MIT licensed
Nemotron (Nvidia)
Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware
MiniMax M2 / M2.5
MiniMax's open-weights frontier -- first open model to match Claude Opus 4.6 on SWE-Bench at 10-20× lower cost
Falcon (TII)
UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware