Gemini (Google)
A Tier · 8.3/10
Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution
Score Breakdown
Benchmark Scores
Benchmarks for Gemini 3.1 Ultra
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU | Knowledge across 57 subjects | 90.5% | |
| GPQA Diamond | Graduate-level science questions | 94.3% | |
| HumanEval | Python code generation | 93.5% | |
| SWE-bench | Real GitHub issue fixing | 80.6% | |
| ARC-AGI | Abstract reasoning puzzles | 77.1% |
Last updated: 2026-04-13
Personality & Tone
The Google research assistant
Tone: Neutral, thorough, and slightly corporate. Gemini leans academic, cites sources readily in Deep Research mode, and keeps its tone even across topics -- rarely funny, rarely snarky.
Quirks: Tightly integrated with Google products -- pulls from Search and Workspace by default, which is useful for grounded answers but means you hear Google's worldview. Can feel evasive or overly safe on opinionated or politically charged questions.
The Good and the Bad
What we like
- +2 million token context window is the largest available -- can process entire books and full codebases in one prompt
- +Best Google Workspace integration (Gmail, Docs, Drive, Calendar)
- +Free tier is more generous than Claude's
- +Gemini Advanced includes 2TB Google One storage -- real added value
- +API pricing is very competitive, especially for Flash model
What could be better
- −Output quality for creative writing is the weakest of the big three (GPT-4, Claude, Gemini)
- −Hallucination rate is higher than Claude in our testing
- −Google's track record of killing products makes long-term commitment feel risky
- −The Gemini app UI feels like Google slapped AI onto an existing product
Pricing
Free
- ✓Gemini 3.1 Flash
- ✓Basic features
- ✓Google integration
Google AI Pro
- ✓Gemini 3.1 Ultra
- ✓2M token context
- ✓Code Execution sandbox
- ✓2TB Google storage
- ✓Workspace integration
- ✓Lyria 3 access
Google AI Ultra
- ✓Gemini 3.1 Ultra (max usage)
- ✓Gemini 3.1 Flash Live audio
- ✓Lyria 3 Pro full access
- ✓Highest API priority
- ✓30TB Google storage
API
- ✓All models
- ✓2M context
- ✓Flash-Lite at $0.25/M input
- ✓Grounding with Google Search
- ✓Code Execution
- ✓Mandatory spend caps (April 2026)
Known Issues
- GEMINI 3.1 FLASH-LITE GA (2026-05-07, TODAY): Generally available on the Gemini Enterprise Agent Platform. Fastest + most cost-efficient Gemini 3 series model. **2.5x faster Time-to-First-Answer-Token vs Gemini 2.5 Flash; +45% output speed**. Pricing per third-party reference: $0.25/M input, $1.50/M output (vendor blog itself omits direct pricing -- check ai.google.dev/pricing for canonical). Customer signals at GA: Gladly reports ~60% lower cost vs thinking-tier models; OffDeal cites sub-second p95 for classifiers. **This is a pre-Google-I/O staging signal** -- I/O 2026 keynote is 2026-05-19 (T-12 from this entry). Expect more pre-keynote drops over the next two weeksSource: Google Cloud blog (cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available), blog.google · 2026-05-07
- Gemini 2.5 family retirement dates EXTENDED (ai.google.dev deprecations page, checked 2026-04-24): Gemini 2.5 Pro, 2.5 Flash, AND 2.5 Flash-Lite now all retire 2026-10-16 (pushed out from original 2026-06-17 / 2026-07-22 dates). Gives ~6 more months to migrate to gemini-3.1-pro + gemini-3-flash. Production code still calling 2.5 model names continues to work through Oct 16, but do not ship new code on retiring endpointsSource: ai.google.dev/gemini-api/docs/deprecations (verified 2026-04-24) · 2026-04-24
- Gemini 3.1 Flash TTS launched 2026-04-15 as a preview on Gemini API, AI Studio, Vertex AI, and Google Vids. 70+ languages, audio tags for vocal style/pace/delivery embedded in the text prompt, Elo 1,211 on Artificial Analysis TTS leaderboard. Positions Google as a direct competitor to ElevenLabs v3 on the TTS stackSource: blog.google Gemini 3.1 Flash TTS, MarkTechPost · 2026-04
- Image generation of people was temporarily disabled after generating historically inaccurate results, partially restored but still limitedSource: The Verge, Google Blog · 2026-01
- Gemini Pro model access removed from free API tier on April 1, 2026 -- mandatory spend caps and prepaid billing now required for new accountsSource: Google AI for Developers, FindSkill.ai · 2026-04
- Google AI Ultra at $249.99/mo is hard to justify against Claude Max ($200) and ChatGPT Pro ($200) unless you specifically need Lyria 3 ProSource: Reddit r/Bard · 2026-04
Best for
Google Workspace power users. If you live in Gmail, Docs, and Drive, Gemini Advanced integrates directly into your workflow. Also great for developers who need the cheapest API with the longest context window.
Not for
Anyone who needs the best raw output quality. Claude and GPT-4 both write better. Also not for anyone spooked by Google's history of abandoning products.
Our Verdict
Gemini's strength is the ecosystem play. The 1M context window is genuinely useful for long documents, and the Google Workspace integration is something neither OpenAI nor Anthropic can match. But purely as an LLM, the output quality is a step behind Claude and GPT-4. Pick Gemini if you're deep in Google's ecosystem. Otherwise, the other two are better standalone.
Sources
- Google AI for Developers: deprecations (accessed 2026-04-21)
- Google Blog: Gemini 3.1 Flash TTS (accessed 2026-04-21)
- Google Gemini official site (accessed 2026-04-21)
- LMSYS Chatbot Arena rankings (accessed 2026-04-13)
- Reddit r/Bard (accessed 2026-04-13)
Explore more Gemini (Google) rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Gemini (Google).
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to Gemini (Google)
Claude (Anthropic)
Anthropic's flagship LLM -- Opus 4.7 (launched April 16, 2026) with 1M-token context, high-res vision, new xhigh reasoning level, and the most natural conversational style. Note: 2026-04-04 policy excluded third-party agent harnesses (OpenClaw etc.) from Pro/Max flat-rate, and 2026-04-16 Enterprise pricing dropped bundled tokens
Claude Mythos Preview
Anthropic's most capable model -- a gated research preview via Project Glasswing, cybersecurity-specialized. 73% success on expert CTF tasks, 32-step autonomous network attacks. Not generally available.
Grok
xAI's irreverent chatbot with a direct line to X/Twitter -- real-time data meets unfiltered personality. Grok 4.3 production launched 2026-05-02 with Custom Voices cloning + Imagine Agent Mode + ~40% API price cut to $1.25/$2.50 per 1M tokens
Muse Spark (Meta)
Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning
GPT-Rosalind (OpenAI)
OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)
GPT-5.4-Cyber (OpenAI)
OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing
Hunyuan 3 (Tencent Hy3)
Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ
MiMo (Xiaomi)
Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch