MiniMax M2.7

A Tier · 8.4/10

MiniMax's open-weights self-evolving agent flagship -- M2.7 (released 2026-03-18) scores 56.22% SWE-Pro and 57.0% Terminal Bench 2 from a 229B/10B-active MoE

Last updated: 2026-04-27Free tier available

Score Breakdown

6.5

Ease of Use

9.0

Output Quality

9.5

Value

8.5

Features

Benchmark Scores

Benchmarks for MiniMax-M2.7 (229B total, ~10B active MoE) -- self-evolving agent positioning per vendor

Chatbot Arena ELOHuman preference rating1495

Benchmark	Description	Score
SWE-Bench Pro		56.22%
Terminal Bench 2		57%
SWE Multilingual		76.5%
Multi SWE Bench		52.7%
VIBE-Pro		55.6%

Last updated: 2026-04-27

Visit MiniMax M2.7

Personality & Tone

The Chinese multimodal generalist

Tone: Expressive and media-rich. MiniMax's chat models lean into long, formatted responses and handle voice and image prompts more naturally than most pure-text peers.

Quirks: Strong multimodal story; text-only quality is good but not class-leading versus DeepSeek or Qwen. Like other Chinese models, careful on domestic political topics.

The Good and the Bad

What we like

+229B/10B-active MoE delivers Tier-1 agentic performance with the smallest active footprint in its class -- M2.7 hits 56.22% on SWE-Bench Pro and 57.0% on Terminal Bench 2
+Sparse MoE design: ~10B active params during inference means fast and cheap to run despite 229B total
+M2 / M2.5 (predecessors) carry true MIT licensing with zero commercial restrictions; M2.7 license needs verification (HF lists 'Other')
+Native agentic and tool-use training -- positioned as a 'self-evolving agent model'
+Per-layer QK-Norm + full-attention blocks make long-context stable
+Strong cost-to-performance ratio for agentic workloads vs closed frontier models

What could be better

−Smaller Western community than Qwen/DeepSeek -- tutorials sparse
−Ollama support arrived late -- community relied on vLLM for months
−English writing tone is noticeably less polished than Claude or Mistral
−PRC content filters apply
−MiniMax as a lab is less well-known than Alibaba or DeepSeek -- some enterprise buyers hesitate

Pricing

Self-hosted (Free)

✓MIT license on M2 / M2.5
✓M2.7 license listed as 'Other' on HuggingFace -- verify before commercial use
✓Weights on Hugging Face (MiniMaxAI/MiniMax-M2.7)

API (M2 / M2.5 reference, MiniMax / OpenRouter)

$0.30/per 1M input tokens

✓M2: $0.30 in / $1.20 out
✓192K+ context
✓Native agentic + tool-use

API (M2.7)

Not yet published

✓M2.7 per-token pricing not posted on platform.minimax.io/subscribe/token-plan as of 2026-04-27 -- verify before quoting in production planning

System Requirements

Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.

Model variant	Min	Max
MiniMax M2 / M2.5 (230B total, ~10B active MoE)Sparse MoE activates only ~10B params during inference -- fast tok/s on moderate hardware	96 GB unified RAM Q3 (Mac M3 Ultra)	4× A100 80 GB FP8
MiniMax M1 (hybrid-attention reasoning predecessor)	96 GB unified RAM Q3	4× A100 80 GB FP8

Known Issues

MiniMax-M2.7 (released 2026-03-18) supersedes M2.5 -- 229B total params, sparse MoE positioned by MiniMax as a 'self-evolving' agent model. Verified third-party benchmarks: SWE-Bench Pro 56.22%, Terminal Bench 2 57.0%, SWE Multilingual 76.5%, Multi SWE Bench 52.7%, VIBE-Pro 55.6%, GDPval-AA Elo 1495. The old 'first open-weight to hit 80.2% SWE-Bench Verified matching Opus 4.6' framing was an M2.5-on-different-bench claim and does NOT carry forward to M2.7 directly -- SWE-Bench Pro and SWE-Bench Verified are not comparable. License labeled 'Other' on HuggingFace (not MIT like M2 / M2.5) -- verify before commercial useSource: MiniMax blog (minimax.io/news/minimax-m27-en), HuggingFace MiniMaxAI/MiniMax-M2.7, MarkTechPost · 2026-03-18
M2 initial release required custom vLLM build -- community quants took 2-3 weeks to stabilize. M2.7 inherits the same architecture so the vLLM compatibility story carries forwardSource: GitHub MiniMax-AI/MiniMax-M2, Hugging Face discussions · 2026-02
Per-layer QK-Norm is non-standard -- some inference backends had subtle bugs at long contextSource: Reddit r/LocalLLaMA · 2026-03

Best for

Agentic coding and tool-use workflows on a budget. Best price-to-SWE-Bench ratio of any open-weights model in 2026.

Not for

Teams that prioritize polished English writing (Mistral Large 3 or Claude are better), or anyone who needs the deepest ecosystem support (Llama is still that).

Our Verdict

MiniMax M2/M2.5 is the most cost-efficient frontier-tier open model in 2026. The 80.2% SWE-Bench Verified score is a genuine breakthrough -- matching Claude Opus 4.6 on real coding tasks at a tenth of the price. The sparse 10B-active MoE runs fast on moderate hardware. The main drawback is ecosystem: MiniMax has less Western infrastructure support than Alibaba or DeepSeek. If you're building an agentic product and want maximum value per token, M2.5 is an A-tier pick.