Grok Speech (STT + TTS APIs) Pricing
All plans and pricing as of 2026-04-18
Speech to Text (batch)
- ✓REST API for large audio files
- ✓Word-level timestamps
- ✓Speaker diarization
- ✓Multichannel support
- ✓Inverse Text Normalization (numbers, dates, currencies)
Speech to Text (streaming)
- ✓Lowest-latency WebSocket API
- ✓Real-time speaker ID
- ✓Same accuracy as batch
- ✓Supports 25+ languages seamlessly
Text to Speech
- ✓Natural expressive voices (ARA voice etc.)
- ✓Speech tags: [laugh], [sigh], [whisper], <emphasis>, <slow>, <pause>
- ✓REST + WebSocket streaming
- ✓Usage-based billing, no hidden fees
Is Grok Speech (STT + TTS APIs) Worth the Price?
Value Score: 9/10
Overall Score: 8.1/10 · Developers building voice agents, real-time transcription tools, accessibility features, or high-volume TTS workloads where the cost per hour of audio actually matters at scale. Strong fit for phone-call and meeting transcription use cases where xAI's published WER advantage (5.0% on phone-call entities vs. ElevenLabs 12.0%) compounds quickly.
Grok Speech is xAI's clearest 'we are a platform, not just a chatbot' shot at the voice-API category, and on day-one pricing alone it's a credible threat to ElevenLabs, Deepgram, and AssemblyAI for production STT workloads. The published WER numbers are aggressive but plausible given the Tesla / Starlink deployment footprint. TTS at $4.20/1M char with real expressive tags undercuts ElevenLabs on price while narrowing the expressiveness gap. The open questions are (1) how it handles long-tail accents and non-English quality in practice, (2) whether the post-SpaceX procurement pathway slows enterprise adoption, and (3) how ElevenLabs responds on price. For new voice-API buyers shipping in Q2 2026, Grok Speech is now a first-call option alongside ElevenLabs and Deepgram.
How Grok Speech (STT + TTS APIs) Pricing Compares
| Tool | Free Tier | Starting Price | Value Score | Overall |
|---|---|---|---|---|
| Grok Speech (STT + TTS APIs)(this tool) | No | $0.10/per hour | 9/10 | 8.1 |
| ElevenLabs | Yes | $0 | 7/10 | 8.5 |
| Descript | Yes | $0 | 8/10 | 8.5 |
| Cohere Transcribe | Yes | $0 | 9/10 | 8.0 |
| Microsoft MAI-Voice-1 | Yes | $22/per 1M characters | 8/10 | 7.3 |
| Murf AI | Yes | $0 | 6/10 | 7.0 |