Grok
B Tier · 7.5/10
xAI's irreverent chatbot with a direct line to X/Twitter -- real-time data meets unfiltered personality. Grok 4.3 production launched 2026-05-02 with Custom Voices cloning + Imagine Agent Mode + ~40% API price cut to $1.25/$2.50 per 1M tokens
Score Breakdown
Benchmark Scores
Benchmarks for Grok 4.20
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU | Knowledge across 57 subjects | 88.5% | |
| GPQA Diamond | Graduate-level science questions | 85% | |
| HumanEval | Python code generation | 90% | |
| Humanity's Last Exam | Frontier difficulty questions | 50.7% |
Last updated: 2026-04-13
Personality & Tone
The irreverent contrarian
Tone: Casual, jokey, and willing to swear. Grok takes strong positions without hedging, leans into an edgy 'based' persona, and cracks jokes far more often than Claude, ChatGPT, or Gemini.
Quirks: Engages with topics other chatbots refuse, pulls live context from X so it reflects whatever is trending that hour, and will freely mock things -- including itself. In SuperGrok's multi-agent mode it can sound like several personalities arguing with each other.
The Good and the Bad
What we like
- +Real-time access to X/Twitter data is genuinely useful for tracking breaking news and trending topics
- +Grok 3 benchmarks are competitive with GPT-4o and Claude 3.5 -- this is not a vanity project anymore
- +The personality is refreshing if you're tired of overly cautious AI assistants -- it'll actually joke around
- +DeepSearch mode does solid multi-step research, pulling from web and X data simultaneously
What could be better
- −The snarky personality gets old fast when you're trying to get serious work done
- −Tied to the X ecosystem -- you need an X account, and the real-time data skews toward X's user base
- −SuperGrok at $30/mo is steep when Claude Pro and ChatGPT Plus are $20 with arguably better core models
- −Image generation and analysis capabilities lag behind what you get from ChatGPT or Gemini
Pricing
Free
- ✓~10 prompts per 2 hours
- ✓Basic Grok access
- ✓Requires X account
X Premium
- ✓Higher query limits
- ✓Grok 4.20 access
- ✓Bundled X social features
X Premium+
- ✓Higher Grok 4.20 access
- ✓Ad-free X
- ✓Priority responses
SuperGrok
- ✓Full Grok 4.20 (4-agent multi-agent system)
- ✓DeepSearch mode
- ✓Highest rate limits
- ✓Think mode
- ✓$300/yr option (16% off)
SuperGrok Heavy
- ✓Grok 4 Heavy model
- ✓Highest priority
- ✓Multi-agent at scale
- ✓Note: Grok 4.3 beta-gating ended 2026-05-02
API (Grok 4.3)
- ✓Production launch 2026-05-02 (~40% input / ~60% output price cut vs 4.20)
- ✓1M context window
- ✓Reasoning tokens billed at output rate
- ✓Native video input + PDF/PPT/spreadsheet output
- ✓Custom Voices voice cloning free on console (80+ presets, 28 languages)
- ✓Imagine Agent Mode (creative workflow agent, beta)
Known Issues
- GROK BUILD PLUGIN MARKETPLACE (2026-06-11, vendor-primary): xAI launched a built-in **plugin marketplace for Grok Build** -- plugins bundle skills, slash commands, agents, hooks, MCP servers, and LSPs; installs are commit-SHA-pinned for supply-chain safety; the catalog is open to community submissions via PR. Launch partners: MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare. Mirrors the plugin/extension pattern Claude Code and Gemini-CLI-era tooling established -- Grok Build is maturing fast for a product still labeled betaSource: xAI news (x.ai/news/grok-plugin-marketplace), GitHub (github.com/xai-org/plugin-marketplace) · 2026-06-11
- JUNE CLUSTER (2026-06, all vendor-primary on x.ai/news): **Grok Imagine 1.5 Preview** (6/3) -- image-to-video generation up to 720p, available as an API preview. **Composer 2.5** (6/1) -- xAI's 'fast, SOTA model for long-running tasks,' now selectable in the Grok Build /models menu for SuperGrok and X Premium+ subscribers (NOT related to Cursor's Composer line despite the name). **Grok Build 0.1 on the API** (5/29) -- the coding-agent model behind Grok Build became directly callable via the xAI API: 256K context, always-on reasoning, text + image input. Grok Build itself ('Introducing Grok Build,' 5/25) is in early beta for ALL SuperGrok and X Premium+ subscribers -- broader than the original Heavy-tier-only gate. Also: Grok voice now powers Vapi (6/3) and Gopuff's 'Go' shopping agent (6/9). NOTE: 'Grok 5' / 'V9-Medium mid-June' claims remain aggregator-only with zero vendor signal -- not real until x.ai posts itSource: xAI news (x.ai/news/grok-imagine-1-5, x.ai/news/composer-2-5, x.ai/news/grok-build-0-1, x.ai/news/grok-build-cli), x.ai/build/changelog · 2026-06-03
- PRODUCT (2026-05-18): xAI shipped **Grok Skills** -- a persistent-memory Skills layer on Grok 4.3. Skills are user-defined named capabilities Grok carries across sessions on web / iOS / Android (recipe collection, code-review checklist, study-habits coach, etc.). Each Skill is a stored prompt + behavioral pattern Grok consults when invoked by name. Per-user storage; not shared across accounts. Differentiates Grok from ChatGPT Memory (passive recall) toward configurable named tools. Pairs with the 5/14 Grok Build CLI ship -- Skills are the consumer-facing persistent-state layer, Build CLI is the developer-facing one. Material in the 'agent goes where you go' competitive narrative alongside Codex on mobile (5/14) + Cursor Jira integration (5/19) + Devin Windows VMs (5/21).Source: xAI news (x.ai/news), xAI release notes (docs.x.ai/developers/release-notes) · 2026-05-18
- MODEL LINEUP CONSOLIDATION (2026-05-15, went live 12:00 PT): xAI auto-redirected **8 deprecated model slugs** to grok-4.3 (or grok-imagine-image-quality for the image model). Affected slugs: grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-0709, grok-code-fast-1, grok-3, grok-imagine-image-pro. All requests now silently bill at grok-4.3 rates ($1.25 input / $2.50 output per 1M tokens). Anyone with these slugs pinned in production or referenced inside a Copilot/Cursor/Codex multi-model selector now pays the new rate without any code change. Migration path: explicitly switch to grok-4.3 in your model selector and audit token-spend after 5/15 since the new rate may differ from what each deprecated slug was previously billed at. The grok-code-fast-1 slug retirement is the same event that took the model off GitHub Copilot's Chat/inline/agent surfaces on 5/15Source: xAI docs (docs.x.ai/developers/migration/may-15-retirement) · 2026-05-15
- PRODUCT (2026-05-14): xAI launched **Grok Build CLI** in early beta -- an agentic terminal-native CLI for coding, app development, and workflow automation. Spawns up to **8 concurrent agents** in parallel. Powered by Grok 4.3 beta with a 16-agent Heavy architecture and **2M token context window**. Vendor-primary launch posts at x.ai/news/grok-build-cli and x.ai/cli, plus Musk's public invitation to wider beta testers on X. **Access gate**: launched first to SuperGrok Heavy tier ($299/mo, intro offer $99/mo for 6 months) -- not yet available to standard Premium / SuperGrok subscribers. Positions Grok as a direct competitor to Claude Code, Codex CLI, and Cursor CLI for terminal-first agentic coding workflows. The 8-agent parallelism + 2M context is the differentiating feature -- single longest context window of any production coding CLI as of todaySource: xAI news (x.ai/news/grok-build-cli), xAI product page (x.ai/cli), Musk on X · 2026-05-14
- xAI joined SpaceX on 2026-02-02 -- SpaceX acquired xAI. Procurement, billing, and compliance workflows now route through SpaceX's vendor pipeline. For regulated industries (healthcare, finance, US government) this may require re-qualifying xAI as a vendor even if Grok itself was previously approvedSource: xAI announcement (x.ai/news/xai-joins-spacex), SpaceX updates · 2026-02
- Grok Speech (STT + TTS) APIs launched 2026-04-17 as separate products from the chatbot -- see /tools/grok-voice on this site. Built on the same stack Grok Voice uses. Not included in Premium/SuperGrok consumer tiers; billed separately at $0.10/hr STT batch and $4.20/1M char TTSSource: xAI Grok STT/TTS announcement · 2026-04
- Real-time X data can surface misinformation from viral posts without adequate fact-checkingSource: Reddit r/artificial · 2026-02
- Free tier rate limits are aggressive -- many users report hitting caps within a few queriesSource: X/Twitter user reports · 2026-03
- Grok 4.20's 4-agent system (Grok, Harper, Benjamin, Lucas) can take 30+ seconds for complex queries as agents debate internally. Grok 4.20 Beta 2 (landed ~2026-04-07) improved instruction-following, reduced hallucinations, better LaTeX and image search -- partially addresses the slowness and reliability complaints from early 4.20 feedbackSource: Reddit r/grok, IBTimes · 2026-04
- PRODUCTION LAUNCH (2026-05-02): Grok 4.3 went broadly available beyond the SuperGrok Heavy beta. New consumer + API features: **Custom Voices voice cloning suite** (clone voice from ~1 minute of speech in <2 minutes, two-stage passphrase + speaker-embedding consent gate, 80+ preset voices, 28 languages, free on console); **Imagine Agent Mode** (creative production workflow agent, beta); native video input + reasoning-by-default; native PDF / PowerPoint / spreadsheet output. **API pricing: $1.25 input / $2.50 output per 1M tokens** -- ~40% input cut + ~60% output cut vs Grok 4.20. 1M context window. Reasoning tokens billed at output rate. xAI's pattern is silent ship via grok.com model selector + console UI rather than vendor blog post -- vendor-primary verification through grok.com itself plus 4+ tier-1 press sources (VentureBeat, Winbuzzer, The Decoder, Phemex)Source: VentureBeat (venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite), Winbuzzer 2026-05-03, The Decoder, grok.com console · 2026-05-02
- Grok 4.3 Beta dropped 2026-04-17 as a SuperGrok Heavy exclusive ($300/mo tier). Elon Musk clarified on 2026-04-18 that the live checkpoint is ~0.5T params; the full 1T version is ~5 days from finishing training. Beta gating ENDED 2026-05-02 with broader rollout (see entry above)Source: PiunikaWeb, BuildFastWithAI, xAI release notes, Musk posts on X (2026-04-18) · 2026-04
Best for
People who live on X/Twitter and want an AI that can tap into that data in real-time. Also good for users who find mainstream chatbots too sanitized and want something with more personality.
Not for
Enterprise users who need reliable, consistent outputs. Also not the best pick if you don't use X -- the real-time data advantage disappears and you're left with a solid-but-not-best-in-class LLM.
Our Verdict
Grok has come a long way from being dismissed as Elon's pet project. The Grok 3 models are legitimately competitive, and the real-time X integration is a unique differentiator that no other chatbot can match. But the value proposition gets muddier when you strip away the X angle -- at $30/mo for SuperGrok, you're paying a premium for personality and Twitter data. If those matter to you, Grok is great. If not, Claude or ChatGPT give you more for less.
Sources
- xAI May 15 model retirement docs (accessed 2026-05-19)
- VentureBeat: xAI launches Grok 4.3 with voice cloning (2026-05-02) (accessed 2026-05-05)
- Winbuzzer: xAI Grok 4.3 + Custom Voices (2026-05-03) (accessed 2026-05-05)
- xAI official site (accessed 2026-04-17)
- xAI Grok 4.20 announcement (accessed 2026-04-17)
- IBTimes: Grok 4.20 Beta 2 April 2026 (accessed 2026-04-17)
- BuildFastWithAI: Grok 4.3 Beta 2026-04-17 (accessed 2026-04-17)
- Artificial Analysis: Grok 4.20 (accessed 2026-04-17)
- Reddit r/grok, r/artificial (accessed 2026-04-17)
Explore more Grok rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Grok.
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to Grok
Claude (Anthropic)
Anthropic's flagship LLM family. Claude Fable 5 (launched June 9, 2026) was the first publicly available Mythos-class model -- but on June 12, 2026 a US government export-control directive ordered access suspended, and Anthropic disabled Fable 5 + Mythos 5 for ALL customers to comply (every other Claude model is unaffected). Opus 4.8 (May 28) is the available flagship: $5/$25 per 1M, 1M-token context, effort control, and a cheap fast mode
Claude Mythos 5
Anthropic's unrestricted frontier model -- launched June 9, 2026 alongside Claude Fable 5 (the same model made safe for general use). ACCESS SUSPENDED June 12, 2026: a US government export-control directive forced Anthropic to disable both Mythos 5 and Fable 5 for all customers; all other Claude models are unaffected. Mythos 5 had been gated to ~150 Project Glasswing orgs and select biology researchers.
Gemini (Google)
Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution -- Gemini 3.5 Flash GA 2026-05-19 (I/O 2026), Gemini 3.5 Pro rolling out June 2026, Gemini Spark agent + Managed Agents public preview in the Gemini API
Muse Spark (Meta)
Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning
GPT-Rosalind (OpenAI)
OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)
GPT-5.4-Cyber (OpenAI)
OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing
Microsoft MAI-Thinking-1
Microsoft's first in-house reasoning model -- launched 2026-06-02 at Build as the flagship of seven new MAI models. 35B-active / ~1T-total sparse Mixture-of-Experts, 256K context. AIME 2025 97.0%, matches leading models on SWE-Bench Pro, and beat Claude Sonnet 4.6 in human-preference testing. Available on Microsoft Foundry + OpenRouter / Fireworks / Baseten
Hunyuan 3 (Tencent Hy3)
Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ
MiMo (Xiaomi)
Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch