Microsoft MAI-Thinking-1
B Tier · 7.5/10
Microsoft's first in-house reasoning model -- launched 2026-06-02 at Build as the flagship of seven new MAI models. 35B-active / ~1T-total sparse Mixture-of-Experts, 256K context. AIME 2025 97.0%, matches leading models on SWE-Bench Pro, and beat Claude Sonnet 4.6 in human-preference testing. Available on Microsoft Foundry + OpenRouter / Fireworks / Baseten
Score Breakdown
Benchmark Scores
Benchmarks for MAI-Thinking-1 (vendor-published 2026-06-02; third-party verification pending)
| Benchmark | Description | Score | |
|---|---|---|---|
| AIME 2025 | 97% | ||
| AIME 2026 | 94.5% |
Last updated: 2026-06-02
The Good and the Bad
What we like
- +Microsoft's first in-house frontier-class reasoning model -- removes the OpenAI dependency for the reasoning tier the same way MAI-Voice/Image/Transcribe did for speech and vision
- +Strong published reasoning numbers: AIME 2025 97.0% and AIME 2026 94.5%, plus it 'matches leading models on key software engineering benchmarks' (SWE-Bench Pro) at a medium model size
- +Sparse MoE design (35B active of ~1T total) is built for cost-efficiency -- Microsoft is positioning it as strong reasoning per dollar rather than a max-size flagship
- +256K context + Chat Completions API compatibility makes it a near drop-in for existing reasoning workloads, and it shipped with day-one availability on OpenRouter, Fireworks, and Baseten
What could be better
- −Pricing not disclosed at launch -- you cannot yet model cost vs Claude / Gemini / GPT reasoning tiers without going through a third-party inference provider's published rate
- −Foundry access is private preview at launch (MAI Playground public preview is 'coming soon') -- broad self-serve evaluation is not open yet
- −Microsoft's benchmark and preference numbers are self-reported; independent AIME / SWE-Bench Pro confirmation typically lags announcement by several weeks
- −No open weights -- this is an API/Foundry model, not a self-hostable open-weight release like Phi or the local-LLM tier
Pricing
Microsoft Foundry
- ✓Launched 2026-06-02 (Microsoft Build) -- private preview on Foundry at launch
- ✓256K-token context window
- ✓Chat Completions API compatible
- ✓Public preview on MAI Playground coming soon
Third-party inference (OpenRouter / Fireworks / Baseten)
- ✓Generally available at launch through OpenRouter, Fireworks, and Baseten
- ✓Pay-as-you-go at each provider's per-token rate
- ✓No first-party consumer subscription -- API/developer access only
Known Issues
- LAUNCH (2026-06-02, Microsoft Build): MAI-Thinking-1 is the flagship of the 'seven new MAI models' wave (alongside MAI-Code-1-Flash, MAI-Image-2.5 + Flash, MAI-Transcribe-1.5, MAI-Voice-2 + Flash). Microsoft describes it as 'a medium-sized model that stands among the strongest models in its weight class.' Vendor-published: AIME 2025 97.0%, AIME 2026 94.5%, matches leading models on SWE-Bench Pro, and in human-preference testing 'users preferred MAI-Thinking-1 over Claude Sonnet 4.6.' Architecture: 35B-active, ~1T-total sparse Mixture-of-Experts; 256K context; Chat Completions API compatible. Availability: private preview on Microsoft Foundry at launch + GA through OpenRouter / Fireworks / Baseten; MAI Playground public preview coming soon.Source: Microsoft AI (microsoft.ai/news/introducing-mai-thinking-1/, microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/) · 2026-06-02
- Self-reported benchmarks pending third-party verification. Microsoft's AIME and SWE-Bench Pro figures and the Sonnet 4.6 preference-win claim are first-party. Treat as vendor numbers until Artificial Analysis / LMArena-style independent confirmation lands (typically 4-8 weeks).Source: Microsoft model announcement · 2026-06
Best for
Azure / Microsoft Foundry shops that want a first-party reasoning model without an OpenAI dependency, and developers who want a cost-efficient reasoning tier (sparse MoE, 256K context) accessible today through OpenRouter, Fireworks, or Baseten.
Not for
Consumer users who want a chat UI (there is no claude.ai-style website -- this is API/Foundry only), teams that need open weights for self-hosting (use the local-LLM tier), or anyone who needs published per-token pricing before committing.
Our Verdict
MAI-Thinking-1 is the most strategically significant entry in Microsoft's June 2, 2026 Build model wave: it is Microsoft's first in-house reasoning model, and the published numbers (AIME 2025 97.0%, SWE-Bench Pro parity with leading models, a human-preference win over Claude Sonnet 4.6) are genuinely competitive for a 35B-active model. Combined with MAI-Code, MAI-Image, MAI-Voice, and MAI-Transcribe, it completes Microsoft's in-house coverage of every major modality and removes the last big OpenAI dependency at the reasoning tier. The open questions are pricing (undisclosed) and independent verification of the benchmarks -- but for Foundry customers this is now a real first-party reasoning option, and the day-one OpenRouter/Fireworks/Baseten availability makes it easy to try.
Sources
- Microsoft AI: Introducing MAI-Thinking-1 (2026-06-02) (accessed 2026-06-02)
- Microsoft AI: Building a hill-climbing machine -- launching seven new MAI models (2026-06-02) (accessed 2026-06-02)
Explore more Microsoft MAI-Thinking-1 rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Microsoft MAI-Thinking-1.
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to Microsoft MAI-Thinking-1
Claude (Anthropic)
Anthropic's flagship LLM -- Opus 4.8 (launched May 28, 2026) with 1M-token context, high-res vision, user-facing effort control, and a cheaper fast mode (2.5x speed, now 3x cheaper than before). Anthropic says 4.8 is a more effective agentic collaborator with sharper judgment. Standard pricing unchanged at $5/$25 per 1M. Note: 2026-04-04 policy excluded third-party agent harnesses (OpenClaw etc.) from Pro/Max flat-rate, and 2026-04-16 Enterprise pricing dropped bundled tokens
Claude Mythos Preview
Anthropic's most capable model -- a gated research preview via Project Glasswing, cybersecurity-specialized. 73% success on expert CTF tasks, 32-step autonomous network attacks. Not generally available.
Gemini (Google)
Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution -- Gemini 3.5 Flash GA 2026-05-19 (I/O 2026), Gemini 3.5 Pro rolling out June 2026, Gemini Spark agent + Managed Agents public preview in the Gemini API
Grok
xAI's irreverent chatbot with a direct line to X/Twitter -- real-time data meets unfiltered personality. Grok 4.3 production launched 2026-05-02 with Custom Voices cloning + Imagine Agent Mode + ~40% API price cut to $1.25/$2.50 per 1M tokens
Muse Spark (Meta)
Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning
GPT-Rosalind (OpenAI)
OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)
GPT-5.4-Cyber (OpenAI)
OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing
Hunyuan 3 (Tencent Hy3)
Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ
MiMo (Xiaomi)
Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch