Grok Speech (STT + TTS APIs) logo

Grok Speech (STT + TTS APIs) Pricing

All plans and pricing as of 2026-04-18

3 plansExcellent value score (9/10)
Most Popular

Speech to Text (batch)

$0.10/per hour
  • REST API for large audio files
  • Word-level timestamps
  • Speaker diarization
  • Multichannel support
  • Inverse Text Normalization (numbers, dates, currencies)
Get Speech to Text (batch)

Speech to Text (streaming)

$0.20/per hour
  • Lowest-latency WebSocket API
  • Real-time speaker ID
  • Same accuracy as batch
  • Supports 25+ languages seamlessly
Get Speech to Text (streaming)

Text to Speech

$4.20/per 1M characters
  • Natural expressive voices (ARA voice etc.)
  • Speech tags: [laugh], [sigh], [whisper], <emphasis>, <slow>, <pause>
  • REST + WebSocket streaming
  • Usage-based billing, no hidden fees
Get Text to Speech

Is Grok Speech (STT + TTS APIs) Worth the Price?

S

Value Score: 9/10

Overall Score: 8.1/10 · Developers building voice agents, real-time transcription tools, accessibility features, or high-volume TTS workloads where the cost per hour of audio actually matters at scale. Strong fit for phone-call and meeting transcription use cases where xAI's published WER advantage (5.0% on phone-call entities vs. ElevenLabs 12.0%) compounds quickly.

Grok Speech is xAI's clearest 'we are a platform, not just a chatbot' shot at the voice-API category, and on day-one pricing alone it's a credible threat to ElevenLabs, Deepgram, and AssemblyAI for production STT workloads. The published WER numbers are aggressive but plausible given the Tesla / Starlink deployment footprint. TTS at $4.20/1M char with real expressive tags undercuts ElevenLabs on price while narrowing the expressiveness gap. The open questions are (1) how it handles long-tail accents and non-English quality in practice, (2) whether the post-SpaceX procurement pathway slows enterprise adoption, and (3) how ElevenLabs responds on price. For new voice-API buyers shipping in Q2 2026, Grok Speech is now a first-call option alongside ElevenLabs and Deepgram.

How Grok Speech (STT + TTS APIs) Pricing Compares

ToolFree TierStarting PriceValue ScoreOverall
Grok Speech (STT + TTS APIs)(this tool)No$0.10/per hour9/108.1
ElevenLabsYes$07/108.5
DescriptYes$08/108.5
Cohere TranscribeYes$09/108.0
Microsoft MAI-Voice-1Yes$22/per 1M characters8/107.3
Murf AIYes$06/107.0