Claude (Anthropic)
A Tier · 8.5/10
Anthropic's flagship LLM -- Opus 4.7 (launched April 16, 2026) with 1M-token context, high-res vision, new xhigh reasoning level, and the most natural conversational style. Note: 2026-04-04 policy excluded third-party agent harnesses (OpenClaw etc.) from Pro/Max flat-rate, and 2026-04-16 Enterprise pricing dropped bundled tokens
Score Breakdown
Benchmark Scores
Benchmarks for Claude Opus 4.7 (4.6 baseline scores shown; 4.7 announced 13% coding lift, 3x production task completion)
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU | Knowledge across 57 subjects | 91.3% | |
| GPQA Diamond | Graduate-level science questions | 91.3% | |
| AIME 2024 | Competition math problems | 99.8% | |
| HumanEval | Python code generation | 94% | |
| SWE-bench | Real GitHub issue fixing | 80.8% | |
| ARC-AGI | Abstract reasoning puzzles | 75.2% |
Last updated: 2026-04-16
Personality & Tone
The thoughtful consultant
Tone: Measured, careful, and slightly formal. Claude explains tradeoffs rather than handing back one-liner answers, asks clarifying questions when a request is ambiguous, and hedges openly when it is not confident.
Quirks: More willing than most models to refuse edgy or ambiguous requests, pushes back on premises it disagrees with, and will flag when you are probably asking the wrong question instead of just answering the one you typed.
The Good and the Bad
What we like
- +Best writing quality of any LLM -- Opus 4.7 outputs read like a human wrote them, not a robot, and instruction-following is substantially sharper than 4.6
- +1M token context window for enterprise API means it can process entire codebases, huge document sets, or long agent traces without chunking
- +Opus 4.7 brings a real step-change in coding (~13% lift on benchmarks, 3x more production tasks resolved per Anthropic) and the new xhigh reasoning level lets you dial in the effort/latency tradeoff
- +First Claude model with genuine high-res vision -- 3.75MP images (2,576px long edge, 3x prior limit) means charts, diagrams, whiteboards, and dense UIs finally work properly
What could be better
- −Free tier is more limited than ChatGPT's -- you hit the cap faster
- −No image generation built in (unlike ChatGPT with DALL-E)
- −Fewer third-party integrations and plugins compared to OpenAI's ecosystem
- −Can be overly cautious and refuse requests that are perfectly fine
Pricing
Free
- ✓Limited messages/day
- ✓Claude Sonnet 4.6
- ✓Basic features
Pro
- ✓5x more usage than Free
- ✓Claude Opus 4.7 + Sonnet 4.6
- ✓Extended thinking, xhigh reasoning level
- ✓Priority access
Max (5x)
- ✓5x Pro usage
- ✓Priority queue
- ✓Opus 4.7 with full xhigh + max reasoning
Max (20x)
- ✓20x Pro usage
- ✓Highest priority
- ✓All generally-available models
- ✓Best for power users and agents
API (Opus 4.7)
- ✓Unchanged from Opus 4.6 pricing
- ✓1M context window
- ✓Tool use, MCP, high-res vision up to 3.75MP
- ✓Bedrock, Vertex AI, Foundry
Known Issues
- PRODUCT + CAPACITY (2026-05-06 Code with Claude SF keynote): Anthropic announced a SpaceX compute partnership at Colossus 1 (300+ MW, 220,000+ NVIDIA GPUs, online 'within the month'). Concurrent product changes shipped TODAY: (a) DOUBLED Claude Code 5-hour rate limits for Pro / Max / Team / seat-based Enterprise plans, (b) REMOVED peak-hours reduction for Pro and Max (peak-hours throttling no longer applies), (c) RAISED API rate limits for Opus models (Opus 4.7 + Opus 4.6 throughput improved). Plus Claude Managed Agents shipped: 'Dreaming' (research preview -- agents review past sessions for self-improvement patterns), 'Outcomes' (public beta -- rubric-graded task success, lifted up to 10 points in tests), and 'Multiagent Orchestration' (public beta -- lead-agent delegates to subagents, e.g. Haiku lead with Opus subagents). Practical impact: existing Pro / Max users see materially more headroom on Claude Code overnight. NOTE: Sonnet 4.8 / Jupiter / Cardinal / KAIROS / Cowork / Undercover Mode -- speculated from the 2026-03-31 source-map leak -- did NOT ship at this keynote. Models page still lists Opus 4.7 / Sonnet 4.6 / Haiku 4.5 as the current trioSource: Anthropic news (anthropic.com/news/higher-limits-spacex), Anthropic Managed Agents (claude.com/blog/new-in-claude-managed-agents), Simon Willison live blog, TheNewStack · 2026-05-06
- SECURITY (CVE-2026-41686, NVD-published 2026-05-04, GHSA-p7fg-763f-g4gf): Anthropic TypeScript SDK (`@anthropic-ai/sdk`) `BetaLocalFilesystemMemoryTool` writes memory files with mode 0o666 (world-readable) and directories with mode 0o777 (world-readable + writable). On shared hosts a local attacker can read persisted agent state; in containers with permissive umasks (typical Docker base images) an attacker with container access can poison memory to steer subsequent model behavior. Affects versions 0.79.0 through 0.91.0. **Fix: upgrade to >= 0.91.1**. CVSS 4.8 (moderate). CWE-732 Incorrect Permission Assignment. Reported by lucasfutures, disclosed 2026-04-24Source: GitHub Security Advisory (github.com/anthropics/anthropic-sdk-typescript/security/advisories/GHSA-p7fg-763f-g4gf), NVD CVE-2026-41686 · 2026-05-04
- PRODUCT (2026-04-28): Anthropic launched Claude for Creative Work with 9 first-party connectors -- Ableton (Live + Push), Adobe Creative Cloud (Photoshop / Premiere / Express via 'Adobe for creativity'), Affinity by Canva, Autodesk Fusion, Blender, Resolume Arena, Resolume Wire, SketchUp, and Splice. The Blender connector is built on MCP and is explicitly accessible to other LLMs -- not Claude-only. Educational pilots also announced with RISD, Ringling, and Goldsmiths. Tier requirements not specified at launch. This is Anthropic's biggest creative-pro market push to date and pairs naturally with the Opus 4.7 launch on 4/16 (vision quality required for visual workflows)Source: Anthropic news (anthropic.com/news/claude-for-creative-work), 9to5mac, Adobe blog · 2026-04-28
- POLICY (2026-04-04, enforced 2026-04-10): Anthropic excluded third-party agent harnesses (OpenClaw cited specifically) from Claude Pro and Max flat-rate plans. Routing Pro/Max via OpenClaw, Claude-on-Cline, or similar frameworks now triggers separate pay-as-you-go 'extra usage' billing rather than the flat plan rate. ~135K OpenClaw instances were impacted at the time of the change. Anthropic temporarily banned OpenClaw's creator from the platform on 2026-04-10 and stated subscriptions 'weren't built to handle the usage patterns' of harnesses that 'run continuous reasoning loops, automatically repeat or retry tasks, and tie into a lot of other third-party tools.' If you run agentic workloads on Claude, expect the API path to be the only viable model going forwardSource: TechCrunch (techcrunch.com/2026/04/10/anthropic-temporarily-banned-openclaws-creator-from-accessing-claude/), The Next Web, PYMNTS · 2026-04-10
- ENTERPRISE PRICING (2026-04-16): Anthropic dropped Claude Enterprise's bundled-token model. Plan moved from ~$200/seat with discounted token allotment to $20/seat base + standard API rates with no token allotment and no usage cap. Customary 10-15% enterprise API discounts also pulled. Heavy users see 2-3x bill increases. Rolling out to enterprises with 150+ seats first. Material for any team evaluating Claude as their primary AI provider at scale -- confirm finance modeling against the new structure before committing seat countsSource: The Register (theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/), The Information, PYMNTS · 2026-04-16
- Claude Haiku 3 (claude-3-haiku-20240307) RETIRED 2026-04-20 -- deprecated -> retired flip confirmed on Anthropic's deprecations page (verified 2026-04-24). If your API code still targets the 2024 Haiku snapshot, requests are now failing -- migrate to claude-haiku-4-5-20251001Source: Anthropic model deprecations page · 2026-04
- Claude Sonnet 4 (claude-sonnet-4-20250514) and Claude Opus 4 (claude-opus-4-20250514) retire 2026-06-15 per Anthropic's deprecations page. Announced 2026-04-14. If your product relies on those specific snapshots, schedule migration work to Sonnet 4.6 (`claude-sonnet-4-6`) or Opus 4.7 (`claude-opus-4-7`) before thenSource: Anthropic model deprecations page · 2026-04
- Free tier rate limits feel aggressive -- heavy users get throttled within a few conversationsSource: Reddit r/ClaudeAI · 2026-03
- Occasionally refuses benign creative writing requests due to safety filtersSource: Reddit r/ClaudeAI · 2026-02
- Claude Mythos Preview is Anthropic's most capable model but is gated to ~40 pilot orgs via Project Glasswing for cybersecurity use (AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Linux Foundation, Microsoft, Nvidia, Palo Alto Networks among them). It is NOT in consumer Pro/Max tiers -- those get Opus 4.7, which Anthropic concedes trails Mythos on cyber tasks. Anthropic has stated Mythos Preview will NOT be made generally available in the near termSource: Axios, Anthropic Mythos Preview announcement · 2026-04
- Opus 4.7 uses an updated tokenizer -- input tokens may increase roughly 1.0-1.35x depending on content type, slightly raising per-request cost even though the published per-token rate is unchangedSource: Anthropic release notes · 2026-04
- Project Deal published 2026-04-25 (anthropic.com/features/project-deal, with TechCrunch + PYMNTS + Legal IT Insider analysis): Anthropic ran a one-week internal marketplace where Claude agents bought, sold, and negotiated on behalf of SF-office employees with no human-in-the-loop. 186 deals closed at ~$4K total volume. Headline finding for Claude API buyers: participants assigned Opus 4.5 got measurably better economic outcomes than those on Haiku 4.5 -- and Haiku-assigned users didn't notice they were losing. Practical takeaway: in agentic workflows where Claude transacts on a user's behalf, model-tier selection has measurable downstream economic cost, not just latency or quality. Treat this as a public signal that Anthropic is moving toward productized agent-as-representative use casesSource: anthropic.com/features/project-deal, TechCrunch, PYMNTS · 2026-04-25
- Anthropic published an explicit ad-free commitment ('Claude is a space to think', 2026-02-04) -- but the differentiation matters now because OpenAI began rolling ads to ChatGPT Free + Go tiers in Feb 2026 (Plus/Pro/Business/Enterprise still ad-free) and Google AI Overviews already carry ad placements. Anthropic's verbatim language: no sponsored links adjacent to conversations, no advertiser-influenced responses, no third-party product placements. Claude's monetization stays enterprise + subscription only. Practically relevant for B2B / regulated / trust-sensitive deployments (legal, healthcare, finance, research) where ad-incentive contamination in outputs is a deal-breakerSource: anthropic.com/news/claude-is-a-space-to-think (2026-02-04), openai.com/index/testing-ads-in-chatgpt, Axios · 2026-02-04
Best for
Writers, analysts, developers, and anyone who values quality of output over quantity of features. If you care about how good the actual text is, Claude is the best.
Not for
People who want an all-in-one platform with image generation, plugins, and browsing built in. ChatGPT's ecosystem is bigger.
Our Verdict
Claude is the LLM you pick when quality matters more than features. Opus 4.7 (April 16, 2026) widened the quality lead on writing and made real step-change gains in software engineering and long-context reasoning, while keeping the $5/$25 per 1M token pricing. The new xhigh reasoning level is the biggest practical change for coding agents -- you can finally dial in real reasoning effort short of max without the latency cost. The 1M context window, 3.75MP vision, and MCP support make it the most capable generally-available model from any vendor today. If you're choosing one to pay $20/mo for, it still comes down to: do you want better outputs (Claude) or more features (ChatGPT)?
Sources
- GitHub Security Advisory: GHSA-p7fg-763f-g4gf (CVE-2026-41686, 2026-05-04) (accessed 2026-05-05)
- Anthropic: Project Deal (2026-04-25) (accessed 2026-04-27)
- TechCrunch: Anthropic created a test marketplace for agent-on-agent commerce (accessed 2026-04-27)
- Anthropic: Claude is a space to think (ad-free policy, 2026-02-04) (accessed 2026-04-27)
- Anthropic: Introducing Claude Opus 4.7 (accessed 2026-04-16)
- CNBC: Anthropic rolls out Claude Opus 4.7 (accessed 2026-04-16)
- Axios: Opus 4.7 trails unreleased Mythos (accessed 2026-04-16)
- Claude Mythos Preview / Project Glasswing (accessed 2026-04-16)
- LMSYS Chatbot Arena rankings (accessed 2026-04-16)
- Hands-on testing (Opus 4.7 via claude.ai and API) (accessed 2026-04-16)
Explore more Claude (Anthropic) rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Claude (Anthropic).
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to Claude (Anthropic)
Claude Mythos Preview
Anthropic's most capable model -- a gated research preview via Project Glasswing, cybersecurity-specialized. 73% success on expert CTF tasks, 32-step autonomous network attacks. Not generally available.
Gemini (Google)
Google's LLM with deep Google Workspace integration, 2M token context window, and native code execution
Grok
xAI's irreverent chatbot with a direct line to X/Twitter -- real-time data meets unfiltered personality. Grok 4.3 production launched 2026-05-02 with Custom Voices cloning + Imagine Agent Mode + ~40% API price cut to $1.25/$2.50 per 1M tokens
Muse Spark (Meta)
Meta's first model from its Superintelligence Lab -- natively multimodal with Contemplating mode for multi-agent reasoning
GPT-Rosalind (OpenAI)
OpenAI's first domain-specific model -- life sciences, drug discovery, translational medicine. Launched 2026-04-16 as a Trusted Access research preview. Launch partners: Amgen, Moderna, Allen Institute, Thermo Fisher. Paired with a Life Sciences Codex plugin (50+ scientific tool integrations)
GPT-5.4-Cyber (OpenAI)
OpenAI's defensive-cybersecurity variant of GPT-5.4, launched 2026-04-16. Lowered refusal boundary for security-research tasks and native binary reverse-engineering. Access gated via Trusted Access for Cyber (TAC) program -- thousands of verified defenders, hundreds of teams, no public pricing
Hunyuan 3 (Tencent Hy3)
Tencent's Hy3 Preview launched 2026-04-23 -- 295B total / 21B active MoE, 256K context, open-sourced on HuggingFace under tencent/Hy3-preview. Cheapest frontier-class API at ~1.2 RMB per million input tokens. Integrated into Yuanbao, WeChat, QQ
MiMo (Xiaomi)
Xiaomi's MiMo-V2.5 family launched 2026-04-22 -- Pro (1T total / 42B active MoE, 1M context, native vision+audio reasoning), Multimodal base, TTS (3 sub-models: base, VoiceDesign, VoiceClone), and ASR (open-source, English + Chinese + major dialects). Full voice pipeline for the agent era. Extra-charge 1M-context tier removed at launch