B

Microsoft MAI-Voice-1

B Tier · 7.3/10

Microsoft's first in-house expressive TTS model -- launched 2026-04-02 on Azure Foundry. Generates 60s of audio in ~1s on a single GPU. Custom voice cloning from a few seconds of input. Powers Copilot, Bing, PowerPoint, and Azure Speech

Last updated: 2026-04-17Free tier available

Score Breakdown

6.0
Ease of Use
8.0
Output Quality
8.0
Value
7.0
Features

The Good and the Bad

What we like

  • +Speed is the real headline -- 60 seconds of audio generated in about 1 second on a single GPU. That is a different class from ElevenLabs or Voxtral for high-volume workflows where throughput beats the last ~5% of expressiveness
  • +First-party Azure Foundry integration means Microsoft customers get a TTS option that doesn't involve an OpenAI dependency. For enterprises managing AI vendor concentration, this is a real unlock
  • +Already in production at scale -- powers Copilot, Bing voice, PowerPoint narration, and Azure Speech as of launch. Not a research preview that might never ship
  • +Custom voice cloning from a few seconds of input is competitive with ElevenLabs, inside an Azure-native security and compliance envelope that enterprise buyers actually need

What could be better

  • Not available as a consumer subscription. API-only pay-as-you-go on Foundry means you need an Azure account and engineering work to use it -- no claude.ai-style website for casual use
  • MAI Playground is US-only at public-preview launch -- international users get pushed straight to the API
  • Expressiveness trails ElevenLabs v3 on emotional range, laughter, sighs, and extended dramatic reading. MAI-Voice-1 optimizes for speed and scale, not nuance
  • Voice cloning raises the same policy concerns as ElevenLabs -- Microsoft has enterprise guardrails but you should still be careful about consent and deepfake risk

Pricing

Azure Foundry API

$22/per 1M characters
  • Pay-as-you-go on Azure Foundry
  • Public preview in Microsoft Foundry + MAI Playground (US only for Playground)
  • Custom voice cloning from ~few seconds of audio
  • ~60s of audio generated in ~1s on a single GPU

MAI Playground (Free preview)

$0
  • US-only web playground for testing
  • Rate-limited preview access
  • No commercial use -- evaluation only

Bundled (Copilot / Bing / PowerPoint / Azure Speech)

Included
  • Existing Microsoft 365 Copilot subscriptions use MAI-Voice-1 under the hood
  • No separate configuration or pricing required for existing Microsoft customers

Known Issues

  • Public preview in US only for MAI Playground. International Foundry API access works but you need an Azure subscription to testSource: Microsoft AI launch post, Tech Community blog · 2026-04
  • Prior-sweep research incorrectly attributed a FLEURS WER #1 claim to MAI-Voice-1. That claim applies to MAI-Transcribe-1 (transcription), not Voice-1 (TTS). Voice-1's headline is speed, not WERSource: Microsoft model card corrections · 2026-04

Best for

Microsoft shops already on Azure who want a TTS option without an OpenAI dependency. Also good for any high-volume TTS workflow (audiobook batch generation, voicemail systems, IVR, bulk narration) where the 60x-faster-than-realtime speed beats ElevenLabs v3's slightly more expressive output.

Not for

Consumer creators who want a polished web UI with presets and style controls -- use ElevenLabs. Also not ideal if top-quartile emotional expressiveness (laughter, sighs, dramatic reading) is your requirement -- v3 still wins there.

Our Verdict

MAI-Voice-1 is Microsoft's first named TTS model in the post-OpenAI-exclusivity era, and it signals how Microsoft plans to differentiate: speed and Azure-native integration over raw expressiveness. The 60s-in-1s throughput is legitimately class-leading, and for any Microsoft shop doing high-volume voice generation it removes the ElevenLabs line item. For consumer creators, ElevenLabs v3 remains the better product. For enterprise or scale workflows on Azure, MAI-Voice-1 is now the default answer.

Sources

  • Microsoft AI: 3 new MAI models in Foundry (accessed 2026-04-17)
  • Microsoft Community Hub: MAI models in Foundry (accessed 2026-04-17)
  • MAI-Voice-1 Foundry model card (accessed 2026-04-17)