Microsoft MAI-Thinking-1

B Tier · 7.5/10

Microsoft's first in-house reasoning model -- launched 2026-06-02 at Build as the flagship of seven new MAI models. 35B-active / ~1T-total sparse Mixture-of-Experts, 256K context. AIME 2025 97.0%, matches leading models on SWE-Bench Pro, and beat Claude Sonnet 4.6 in human-preference testing. Available on Microsoft Foundry + OpenRouter / Fireworks / Baseten

Last updated: 2026-06-02

Score Breakdown

6.0

Ease of Use

8.5

Output Quality

7.5

Value

8.0

Features

Benchmark Scores

Benchmarks for MAI-Thinking-1 (vendor-published 2026-06-02; third-party verification pending)

Benchmark	Description	Score
AIME 2025		97%
AIME 2026		94.5%

Last updated: 2026-06-02

Visit Microsoft MAI-Thinking-1

The Good and the Bad

What we like

+Microsoft's first in-house frontier-class reasoning model -- removes the OpenAI dependency for the reasoning tier the same way MAI-Voice/Image/Transcribe did for speech and vision
+Strong published reasoning numbers: AIME 2025 97.0% and AIME 2026 94.5%, plus it 'matches leading models on key software engineering benchmarks' (SWE-Bench Pro) at a medium model size
+Sparse MoE design (35B active of ~1T total) is built for cost-efficiency -- Microsoft is positioning it as strong reasoning per dollar rather than a max-size flagship
+256K context + Chat Completions API compatibility makes it a near drop-in for existing reasoning workloads, and it shipped with day-one availability on OpenRouter, Fireworks, and Baseten

What could be better

−Pricing not disclosed at launch -- you cannot yet model cost vs Claude / Gemini / GPT reasoning tiers without going through a third-party inference provider's published rate
−Foundry access is private preview at launch (MAI Playground public preview is 'coming soon') -- broad self-serve evaluation is not open yet
−Microsoft's benchmark and preference numbers are self-reported; independent AIME / SWE-Bench Pro confirmation typically lags announcement by several weeks
−No open weights -- this is an API/Foundry model, not a self-hostable open-weight release like Phi or the local-LLM tier

Pricing

Microsoft Foundry

Not disclosed

✓Launched 2026-06-02 (Microsoft Build) -- private preview on Foundry at launch
✓256K-token context window
✓Chat Completions API compatible
✓Public preview on MAI Playground coming soon

Third-party inference (OpenRouter / Fireworks / Baseten)

Provider-set

✓Generally available at launch through OpenRouter, Fireworks, and Baseten
✓Pay-as-you-go at each provider's per-token rate
✓No first-party consumer subscription -- API/developer access only

Known Issues

LAUNCH (2026-06-02, Microsoft Build): MAI-Thinking-1 is the flagship of the 'seven new MAI models' wave (alongside MAI-Code-1-Flash, MAI-Image-2.5 + Flash, MAI-Transcribe-1.5, MAI-Voice-2 + Flash). Microsoft describes it as 'a medium-sized model that stands among the strongest models in its weight class.' Vendor-published: AIME 2025 97.0%, AIME 2026 94.5%, matches leading models on SWE-Bench Pro, and in human-preference testing 'users preferred MAI-Thinking-1 over Claude Sonnet 4.6.' Architecture: 35B-active, ~1T-total sparse Mixture-of-Experts; 256K context; Chat Completions API compatible. Availability: private preview on Microsoft Foundry at launch + GA through OpenRouter / Fireworks / Baseten; MAI Playground public preview coming soon.Source: Microsoft AI (microsoft.ai/news/introducing-mai-thinking-1/, microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/) · 2026-06-02
Self-reported benchmarks pending third-party verification. Microsoft's AIME and SWE-Bench Pro figures and the Sonnet 4.6 preference-win claim are first-party. Treat as vendor numbers until Artificial Analysis / LMArena-style independent confirmation lands (typically 4-8 weeks).Source: Microsoft model announcement · 2026-06

Best for

Azure / Microsoft Foundry shops that want a first-party reasoning model without an OpenAI dependency, and developers who want a cost-efficient reasoning tier (sparse MoE, 256K context) accessible today through OpenRouter, Fireworks, or Baseten.

Not for

Consumer users who want a chat UI (there is no claude.ai-style website -- this is API/Foundry only), teams that need open weights for self-hosting (use the local-LLM tier), or anyone who needs published per-token pricing before committing.

Our Verdict

MAI-Thinking-1 is the most strategically significant entry in Microsoft's June 2, 2026 Build model wave: it is Microsoft's first in-house reasoning model, and the published numbers (AIME 2025 97.0%, SWE-Bench Pro parity with leading models, a human-preference win over Claude Sonnet 4.6) are genuinely competitive for a 35B-active model. Combined with MAI-Code, MAI-Image, MAI-Voice, and MAI-Transcribe, it completes Microsoft's in-house coverage of every major modality and removes the last big OpenAI dependency at the reasoning tier. The open questions are pricing (undisclosed) and independent verification of the benchmarks -- but for Foundry customers this is now a real first-party reasoning option, and the day-one OpenRouter/Fireworks/Baseten availability makes it easy to try.

Sources

Microsoft AI: Introducing MAI-Thinking-1 (2026-06-02) (accessed 2026-06-02)
Microsoft AI: Building a hill-climbing machine -- launching seven new MAI models (2026-06-02) (accessed 2026-06-02)

Explore more Microsoft MAI-Thinking-1 rankings

Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Microsoft MAI-Thinking-1.

Full AI LLMs & Models tier list

Where Microsoft MAI-Thinking-1 ranks vs every competitor in its category

AIME leaderboard

The American Invitational Math Exam, used as a rolling frontier-math benchmark.

Best AI tools to research a topic

Research assistants that gather, cite, and synthesize sources across the web into a structured answer.

Best AI tools to answer questions from documents

Chat-with-your-docs tools that build a retrieval layer over PDFs, transcripts, and knowledge bases.

Is Microsoft MAI-Thinking-1 down?

Outage check plus rolling log of known issues

Microsoft MAI-Thinking-1 pricing

Every tier and what's included

Microsoft MAI-Thinking-1 alternatives

Comparable tools at every tier

The Tier List Tuesday

Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.

Alternatives to Microsoft MAI-Thinking-1

Claude (Anthropic)

Anthropic's flagship LLM family. After a 19-day US-government export-control suspension (June 12-30), Claude Fable 5 -- the first publicly available Mythos-class model -- returned globally on July 1, 2026. New on June 30: Claude Sonnet 5, the 'most agentic Sonnet yet,' now the default on Free/Pro at $2/$10 per 1M (intro through Aug 31, then $3/$15). Opus 4.8 remains the top-end flagship at $5/$25 per 1M with a 1M-token context, effort control, and a cheap fast mode

8.5/10

Free tierFrom $0

Best writing quality of any LLM -- Opus ...1M token context window for enterprise A...

Updated 2026-07-10

Claude Mythos 5

Anthropic's unrestricted frontier model -- launched June 9, 2026 alongside Claude Fable 5 (the same model made safe for general use). Suspended June 12 by a US export-control order, then PARTIALLY RESTORED July 1, 2026 (US government lifted controls June 30): Mythos 5 is back for a set of US organizations with government approval, while Anthropic works to re-expand the broader Glasswing program. Public Fable 5 returned globally the same day. Gated to Project Glasswing orgs + select biology researchers.

6.5/10

From Invite only

The most capable Anthropic model availab...73% success rate on expert-level Capture...

Updated 2026-07-04

Gemini (Google)

Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution -- Gemini 3.5 Flash GA 2026-05-19 (I/O 2026, now with native computer use), Gemini 3.5 Pro still unshipped as of July 2026 (slipped past its June window), Gemini Spark agent + Managed Agents public preview in the Gemini API

8.3/10

Free tierFrom $0

2 million token context window is the la...Best Google Workspace integration (Gmail...

Updated 2026-07-05

Grok

SpaceXAI's irreverent chatbot with a direct line to X/Twitter -- and now Grok 4.5 (launched 2026-07-08), the frontier MoE model trained jointly with Cursor for coding, agentic tasks, and knowledge work at $2/$6 per 1M tokens. Grok 4.3 remains the value tier at $1.25/$2.50

7.5/10

Free tierFrom $0

Real-time access to X/Twitter data is ge...Grok 3 benchmarks are competitive with G...

Updated 2026-07-10

Muse Spark (Meta)

Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning

8.8/10

Free tierFrom $0

Completely free to use via Meta AI app a...Natively multimodal: handles text, image...

Updated 2026-04-19

GPT-Rosalind (OpenAI)

OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)

6.8/10

From Invite only

OpenAI's first named vertical/domain-spe...Launch partners Amgen, Moderna, Allen In...

Updated 2026-04-17

GPT-5.4-Cyber (OpenAI)

OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing

7.2/10

From Not publicly disclosed

Directly competes with Claude Mythos Pre...Lowered refusal boundary on defensive-se...

Updated 2026-04-19

Hunyuan 3 (Tencent Hy3)

Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ

8.1/10

Free tierFrom $0

Open weights from a top-3 Chinese tech c...Pricing is aggressive. ~1.2 RMB per mill...

Updated 2026-04-25

MiMo (Xiaomi)

Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch

8.3/10

Free tierFrom $0

Full voice pipeline shipped together: a ...Native multimodal in MiMo-V2.5-Pro is th...

Updated 2026-07-04