Microsoft MAI-Thinking-1 logo
B

Microsoft MAI-Thinking-1

B Tier · 7.5/10

Microsoft's first in-house reasoning model -- launched 2026-06-02 at Build as the flagship of seven new MAI models. 35B-active / ~1T-total sparse Mixture-of-Experts, 256K context. AIME 2025 97.0%, matches leading models on SWE-Bench Pro, and beat Claude Sonnet 4.6 in human-preference testing. Available on Microsoft Foundry + OpenRouter / Fireworks / Baseten

Last updated: 2026-06-02

Score Breakdown

6.0
Ease of Use
8.5
Output Quality
7.5
Value
8.0
Features

Benchmark Scores

Benchmarks for MAI-Thinking-1 (vendor-published 2026-06-02; third-party verification pending)

BenchmarkScore
AIME 202597%
AIME 202694.5%

Last updated: 2026-06-02

The Good and the Bad

What we like

  • +Microsoft's first in-house frontier-class reasoning model -- removes the OpenAI dependency for the reasoning tier the same way MAI-Voice/Image/Transcribe did for speech and vision
  • +Strong published reasoning numbers: AIME 2025 97.0% and AIME 2026 94.5%, plus it 'matches leading models on key software engineering benchmarks' (SWE-Bench Pro) at a medium model size
  • +Sparse MoE design (35B active of ~1T total) is built for cost-efficiency -- Microsoft is positioning it as strong reasoning per dollar rather than a max-size flagship
  • +256K context + Chat Completions API compatibility makes it a near drop-in for existing reasoning workloads, and it shipped with day-one availability on OpenRouter, Fireworks, and Baseten

What could be better

  • Pricing not disclosed at launch -- you cannot yet model cost vs Claude / Gemini / GPT reasoning tiers without going through a third-party inference provider's published rate
  • Foundry access is private preview at launch (MAI Playground public preview is 'coming soon') -- broad self-serve evaluation is not open yet
  • Microsoft's benchmark and preference numbers are self-reported; independent AIME / SWE-Bench Pro confirmation typically lags announcement by several weeks
  • No open weights -- this is an API/Foundry model, not a self-hostable open-weight release like Phi or the local-LLM tier

Pricing

Microsoft Foundry

Not disclosed
  • Launched 2026-06-02 (Microsoft Build) -- private preview on Foundry at launch
  • 256K-token context window
  • Chat Completions API compatible
  • Public preview on MAI Playground coming soon

Third-party inference (OpenRouter / Fireworks / Baseten)

Provider-set
  • Generally available at launch through OpenRouter, Fireworks, and Baseten
  • Pay-as-you-go at each provider's per-token rate
  • No first-party consumer subscription -- API/developer access only

Known Issues

  • LAUNCH (2026-06-02, Microsoft Build): MAI-Thinking-1 is the flagship of the 'seven new MAI models' wave (alongside MAI-Code-1-Flash, MAI-Image-2.5 + Flash, MAI-Transcribe-1.5, MAI-Voice-2 + Flash). Microsoft describes it as 'a medium-sized model that stands among the strongest models in its weight class.' Vendor-published: AIME 2025 97.0%, AIME 2026 94.5%, matches leading models on SWE-Bench Pro, and in human-preference testing 'users preferred MAI-Thinking-1 over Claude Sonnet 4.6.' Architecture: 35B-active, ~1T-total sparse Mixture-of-Experts; 256K context; Chat Completions API compatible. Availability: private preview on Microsoft Foundry at launch + GA through OpenRouter / Fireworks / Baseten; MAI Playground public preview coming soon.Source: Microsoft AI (microsoft.ai/news/introducing-mai-thinking-1/, microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/) · 2026-06-02
  • Self-reported benchmarks pending third-party verification. Microsoft's AIME and SWE-Bench Pro figures and the Sonnet 4.6 preference-win claim are first-party. Treat as vendor numbers until Artificial Analysis / LMArena-style independent confirmation lands (typically 4-8 weeks).Source: Microsoft model announcement · 2026-06

Best for

Azure / Microsoft Foundry shops that want a first-party reasoning model without an OpenAI dependency, and developers who want a cost-efficient reasoning tier (sparse MoE, 256K context) accessible today through OpenRouter, Fireworks, or Baseten.

Not for

Consumer users who want a chat UI (there is no claude.ai-style website -- this is API/Foundry only), teams that need open weights for self-hosting (use the local-LLM tier), or anyone who needs published per-token pricing before committing.

Our Verdict

MAI-Thinking-1 is the most strategically significant entry in Microsoft's June 2, 2026 Build model wave: it is Microsoft's first in-house reasoning model, and the published numbers (AIME 2025 97.0%, SWE-Bench Pro parity with leading models, a human-preference win over Claude Sonnet 4.6) are genuinely competitive for a 35B-active model. Combined with MAI-Code, MAI-Image, MAI-Voice, and MAI-Transcribe, it completes Microsoft's in-house coverage of every major modality and removes the last big OpenAI dependency at the reasoning tier. The open questions are pricing (undisclosed) and independent verification of the benchmarks -- but for Foundry customers this is now a real first-party reasoning option, and the day-one OpenRouter/Fireworks/Baseten availability makes it easy to try.

Sources

  • Microsoft AI: Introducing MAI-Thinking-1 (2026-06-02) (accessed 2026-06-02)
  • Microsoft AI: Building a hill-climbing machine -- launching seven new MAI models (2026-06-02) (accessed 2026-06-02)

The Tier List Tuesday

Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.

Alternatives to Microsoft MAI-Thinking-1

Claude (Anthropic) logo

Claude (Anthropic)

Anthropic's flagship LLM -- Opus 4.8 (launched May 28, 2026) with 1M-token context, high-res vision, user-facing effort control, and a cheaper fast mode (2.5x speed, now 3x cheaper than before). Anthropic says 4.8 is a more effective agentic collaborator with sharper judgment. Standard pricing unchanged at $5/$25 per 1M. Note: 2026-04-04 policy excluded third-party agent harnesses (OpenClaw etc.) from Pro/Max flat-rate, and 2026-04-16 Enterprise pricing dropped bundled tokens

A
8.5/10
Free tierFrom $0
Best writing quality of any LLM -- Opus ...1M token context window for enterprise A...
Updated 2026-06-02
Claude Mythos Preview logo

Claude Mythos Preview

Anthropic's most capable model -- a gated research preview via Project Glasswing, cybersecurity-specialized. 73% success on expert CTF tasks, 32-step autonomous network attacks. Not generally available.

C
6.5/10
From Invite only
The most capable Anthropic model availab...73% success rate on expert-level Capture...
Updated 2026-04-20
Gemini (Google) logo

Gemini (Google)

Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution -- Gemini 3.5 Flash GA 2026-05-19 (I/O 2026), Gemini 3.5 Pro rolling out June 2026, Gemini Spark agent + Managed Agents public preview in the Gemini API

A
8.3/10
Free tierFrom $0
2 million token context window is the la...Best Google Workspace integration (Gmail...
Updated 2026-06-02
Grok logo

Grok

xAI's irreverent chatbot with a direct line to X/Twitter -- real-time data meets unfiltered personality. Grok 4.3 production launched 2026-05-02 with Custom Voices cloning + Imagine Agent Mode + ~40% API price cut to $1.25/$2.50 per 1M tokens

B
7.5/10
Free tierFrom $0
Real-time access to X/Twitter data is ge...Grok 3 benchmarks are competitive with G...
Updated 2026-05-21
Muse Spark (Meta) logo

Muse Spark (Meta)

Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning

A
8.8/10
Free tierFrom $0
Completely free to use via Meta AI app a...Natively multimodal: handles text, image...
Updated 2026-04-19
GPT-Rosalind (OpenAI) logo

GPT-Rosalind (OpenAI)

OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)

C
6.8/10
From Invite only
OpenAI's first named vertical/domain-spe...Launch partners Amgen, Moderna, Allen In...
Updated 2026-04-17
GPT-5.4-Cyber (OpenAI) logo

GPT-5.4-Cyber (OpenAI)

OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing

B
7.2/10
From Not publicly disclosed
Directly competes with Claude Mythos Pre...Lowered refusal boundary on defensive-se...
Updated 2026-04-19
Hunyuan 3 (Tencent Hy3) logo

Hunyuan 3 (Tencent Hy3)

Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ

A
8.1/10
Free tierFrom $0
Open weights from a top-3 Chinese tech c...Pricing is aggressive. ~1.2 RMB per mill...
Updated 2026-04-25
MiMo (Xiaomi) logo

MiMo (Xiaomi)

Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch

A
8.3/10
Free tierFrom $0
Full voice pipeline shipped together: a ...Native multimodal in MiMo-V2.5-Pro is th...
Updated 2026-04-25