MiniMax M2 / M2.5
A Tier · 8.4/10
MiniMax's open-weights frontier -- first open model to match Claude Opus 4.6 on SWE-Bench at 10-20× lower cost
Score Breakdown
Benchmark Scores
Benchmarks for MiniMax M2.5 (230B/10B active MoE)
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU-Pro | Harder multi-subject reasoning | 82.1% | |
| GPQA Diamond | Graduate-level science questions | 76.8% | |
| SWE-Bench Verified | 80.2% | ||
| HumanEval | Python code generation | 91% | |
| AIME 2025 | 85.3% |
Last updated: 2026-04-13
The Good and the Bad
What we like
- +First open-weight model to hit 80.2% on SWE-Bench Verified -- matching Claude Opus 4.6
- +~10B active params during inference (out of 230B) means fast and cheap to run
- +MIT license with zero commercial restrictions
- +Native agentic and tool-use training -- not bolted on
- +Per-layer QK-Norm + full-attention blocks make long-context stable
- +10-20× cheaper than closed frontier models at similar quality per Bytebot analysis
What could be better
- −Smaller Western community than Qwen/DeepSeek -- tutorials sparse
- −Ollama support arrived late -- community relied on vLLM for months
- −English writing tone is noticeably less polished than Claude or Mistral
- −PRC content filters apply
- −MiniMax as a lab is less well-known than Alibaba or DeepSeek -- some enterprise buyers hesitate
Pricing
Self-hosted (Free)
- ✓MIT license on M2 and M2.5
- ✓Weights on Hugging Face
- ✓Commercial use permitted
API (OpenRouter / MiniMax)
- ✓M2: $0.30 in / $1.20 out
- ✓192K+ context
- ✓Native agentic + tool-use
System Requirements
Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.
| Model variant | Min | Max |
|---|---|---|
| MiniMax M2 / M2.5 (230B total, ~10B active MoE)Sparse MoE activates only ~10B params during inference -- fast tok/s on moderate hardware | 96 GB unified RAM Q3 (Mac M3 Ultra) | 4× A100 80 GB FP8 |
| MiniMax M1 (hybrid-attention reasoning predecessor) | 96 GB unified RAM Q3 | 4× A100 80 GB FP8 |
Known Issues
- M2 initial release required custom vLLM build -- community quants took 2-3 weeks to stabilizeSource: GitHub MiniMax-AI/MiniMax-M2, Hugging Face discussions · 2026-02
- Per-layer QK-Norm is non-standard -- some inference backends had subtle bugs at long contextSource: Reddit r/LocalLLaMA · 2026-03
Best for
Agentic coding and tool-use workflows on a budget. Best price-to-SWE-Bench ratio of any open-weights model in 2026.
Not for
Teams that prioritize polished English writing (Mistral Large 3 or Claude are better), or anyone who needs the deepest ecosystem support (Llama is still that).
Our Verdict
MiniMax M2/M2.5 is the most cost-efficient frontier-tier open model in 2026. The 80.2% SWE-Bench Verified score is a genuine breakthrough -- matching Claude Opus 4.6 on real coding tasks at a tenth of the price. The sparse 10B-active MoE runs fast on moderate hardware. The main drawback is ecosystem: MiniMax has less Western infrastructure support than Alibaba or DeepSeek. If you're building an agentic product and want maximum value per token, M2.5 is an A-tier pick.
Sources
- Artificial Analysis MiniMax M2 benchmarks (accessed 2026-04-13)
- Bytebot MiniMax M2.5 analysis (accessed 2026-04-13)
- Hugging Face MiniMaxAI collection (accessed 2026-04-13)
- GitHub MiniMax-AI/MiniMax-M2 (accessed 2026-04-13)
- OpenRouter pricing (accessed 2026-04-13)
Alternatives to MiniMax M2 / M2.5
Llama 4 (Meta)
Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview
Mistral AI
European AI lab with open and commercial models that punch well above their size
DeepSeek
Near-frontier reasoning for pennies on the dollar -- the open-source LLM that made Silicon Valley nervous
Gemma 4 (Google)
Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices
Qwen (Alibaba)
Alibaba's open-weights family -- Qwen3.5, Qwen3-Coder-Next, Qwen3-VL, Qwen3-Max. Apache 2.0 flagship sizes.
GLM / Z.ai (Zhipu AI)
Zhipu AI's open-weights family -- GLM-4.6 text flagship and GLM-4.6V multimodal, true MIT licensed
Kimi K2.5 (Moonshot)
Moonshot's 1T-parameter MoE open-weights flagship -- best open-source agentic coder, rivals Claude Opus 4.5
Nemotron (Nvidia)
Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware
Falcon (TII)
UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware