A

IBM Granite 4.0

A Tier · 8.2/10

IBM's enterprise-focused open-weight family -- Granite 4.0 hybrid Mamba-2 + transformer architecture (70-80% memory reduction vs pure transformer), 3B to 32B sizes, Apache 2.0. First open model family to secure ISO 42001 certification. Nano 350M runs on CPU with 8-16GB RAM. 3B Vision variant landed 2026-04-01

Last updated: 2026-04-17Free tier available

Score Breakdown

7.0
Ease of Use
8.0
Output Quality
9.5
Value
8.5
Features

The Good and the Bad

What we like

  • +Hybrid Mamba-2 + transformer architecture delivers 70-80% memory reduction versus pure transformer at the same quality tier -- the practical result is that Granite 4.0 3B fits inside memory budgets that older 7B transformers needed, and Granite 32B fits where 70B would have
  • +Granite 4.0 Nano (350M and 1.5B) is genuinely runnable on CPU with 8-16GB RAM -- no GPU required. One of the few open-weight options that realistically runs inside a browser via WebGPU or on a laptop without heat issues
  • +First open model family with ISO 42001 AI management system certification -- matters significantly for regulated industries (healthcare, finance, government) where an audit trail on AI governance is a procurement requirement
  • +Granite 4.0 3B Vision (2026-04-01) adds multimodality to the lineup at a size where consumer GPUs can actually run it. Apache 2.0 on the vision variant too

What could be better

  • Granite does not top absolute-quality open-weight benchmarks -- DeepSeek V3.2, Qwen 3.6, GLM-5.1, and Llama 4 all outscore it at the flagship tier. Granite competes on efficiency / governance / enterprise-fit, not on raw-model leaderboard position
  • Mamba-2 hybrid layers need first-party runtime support -- Ollama and llama.cpp support is improving but has been behind pure-transformer models. Check runtime compatibility before committing to Granite on exotic hardware
  • Smaller community than Llama / Qwen / DeepSeek -- fewer third-party fine-tunes, fewer Reddit help threads, less Stack Overflow coverage. IBM watsonx is the curated-docs route for teams that need that
  • IBM brand is polarizing in open-source -- some developers are enthusiastic about having a US enterprise incumbent in the open-weight game, others assume enterprise = stodgy and skip it entirely. Evaluate the models, not the logo

Pricing

Self-hosted (Apache 2.0)

$0
  • Apache 2.0 license, unrestricted commercial use
  • Weights on Hugging Face + Ollama + watsonx
  • Full family: Nano 350M / 1.5B, 3B, 3B Vision, 7B, 32B
  • First open model family with ISO 42001 AI management system certification

watsonx.ai (IBM-hosted)

Usage-based/per 1M tokens
  • Enterprise-grade SLAs, data residency
  • ISO 42001 + SOC 2 + HIPAA compliance
  • Watsonx.governance integration (model cards, lineage)
  • Priced via IBM enterprise contracts

System Requirements

Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.

Model variantMinMax
Granite 4.0 Nano (350M / 1.5B)Apache 2.0. Runs on CPU with no GPU required8-16 GB RAM (CPU inference)Any modern laptop GPU
Granite 4.0 3B / 3B Vision (2026-04-01)Mamba-2 hybrid; memory per token ~30 percent lower than pure transformer at same scale6 GB VRAM Q4 (RTX 3060)24 GB VRAM FP16
Granite 4.0 7B / 32BISO 42001 certified, enterprise-grade governance8 / 24 GB VRAM Q41× A100 40 GB / 2× H100 FP16

Known Issues

  • Granite 4.0 3B Vision released 2026-04-01 -- community tool support (llama.cpp vision tower, Ollama vision variant) is still catching up in the first 2-3 weeks post-launch. Watsonx.ai has first-class vision support out of the boxSource: IBM announcement, Hugging Face discussions · 2026-04
  • Granite Nano 350M is a capability floor, not a ceiling -- it handles simple classification and short extraction well but struggles on multi-step reasoning. Use the 3B or larger for anything agenticSource: IBM model cards, VentureBeat coverage · 2025-12

Best for

Regulated-industry enterprises (healthcare, finance, government) who need Apache 2.0 open-weight models with ISO 42001 certification. Also ideal for edge deployments where Granite Nano (350M / 1.5B) is one of the few open models that runs realistically on CPU. And for any Mamba-hybrid research or low-memory production use where the 70-80% memory reduction actually changes the economics.

Not for

Teams chasing absolute raw quality -- DeepSeek V3.2, GLM-5.1, Qwen 3.6 are all stronger on general benchmarks. Also not for users who want a rich community ecosystem and extensive third-party fine-tunes -- Llama or Qwen wins on that axis.

Our Verdict

IBM Granite 4.0 is the best open-weight option in 2026 for enterprise procurement, edge deployment, and any workflow where the Mamba-2 hybrid architecture's memory savings materially change what hardware you need. The ISO 42001 certification is a genuinely differentiating feature for regulated industries -- nothing else in the open-weight category has it. The tradeoff is absolute-quality: Granite is not trying to beat DeepSeek or GLM on benchmarks, it is trying to be the model your compliance team can actually approve. If that matters, Granite is the pick. If it doesn't, you have other options that score higher.

Sources

  • IBM: Granite 4.0 hyper-efficient hybrid models (accessed 2026-04-17)
  • VentureBeat: Granite 4.0 Nano runs locally (accessed 2026-04-17)
  • Hugging Face: IBM Granite 4.0 3B Vision (accessed 2026-04-17)

Alternatives to IBM Granite 4.0

Llama 4 (Meta) logo

Llama 4 (Meta)

Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview

B
7.9/10
Free tierFrom $0
Llama 4 Scout has a 10M token context wi...Llama 4 Maverick is natively multimodal ...
Updated 2026-04-13
Mistral AI logo

Mistral AI

European AI lab with open and commercial models -- Mistral Small 4 (Mar 2026, 119B MoE Apache 2.0 unified model), Medium 3 (Apr 9 2026), and Voxtral TTS (open-source speech, Mar 2026)

B
7.5/10
Free tierFrom $0
Mistral Small 4 (March 2026) unifies the...Voxtral TTS (March 2026) fills the one g...
Updated 2026-04-16
DeepSeek logo

DeepSeek

Near-frontier reasoning for pennies on the dollar -- the open-source LLM that made Silicon Valley nervous

A
8.0/10
Free tierFrom $0
Pricing is absurdly cheap compared to GP...DeepSeek-R1 reasoning model genuinely co...
Updated 2026-04-17
Gemma 4 (Google) logo

Gemma 4 (Google)

Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices

A
8.3/10
Free tierFrom $0
Apache 2.0 license -- truly permissive, ...Multimodal: handles text + image input (...
Updated 2026-04-08
Qwen (Alibaba) logo

Qwen (Alibaba)

Alibaba's open-weights + API family -- Qwen 3.6-Plus (Mar 30 2026, 1M context + always-on CoT + agentic tool-use), Qwen3.5 Small (2B runs on iPhone, 9B matches 120B-class models), plus Qwen3.5-Omni native multimodal. Apache 2.0 on the open sizes

A
8.8/10
Free tierFrom $0
Qwen 3.6-Plus (launched Mar 30 2026) is ...Qwen3.5 Small (0.8B / 2B / 4B / 9B) is t...
Updated 2026-04-17
GLM / Z.ai (Zhipu AI) logo

GLM / Z.ai (Zhipu AI)

Zhipu AI's open-weights family -- GLM-5.1 (launched 2026-04-07) is 744B MoE / 40B active, topped SWE-Bench Pro at 58.4 (beating GPT-5.4 and Claude Opus 4.6), MIT licensed, 200K context. Trained entirely on 100K Huawei Ascend 910B chips -- first frontier model with zero Nvidia in the training stack

A
8.0/10
Free tierFrom $0
GLM-5.1 (2026-04-07) topped SWE-Bench Pr...First frontier model trained entirely on...
Updated 2026-04-17
Kimi K2.5 (Moonshot) logo

Kimi K2.5 (Moonshot)

Moonshot's 1T-parameter MoE open-weights flagship -- best open-source agentic coder, rivals Claude Opus 4.5

A
8.1/10
Free tierFrom $0
Frontier-tier performance -- Elo 1309 on...Beats Claude Opus 4.5 on several coding ...
Updated 2026-04-13
Nemotron (Nvidia) logo

Nemotron (Nvidia)

Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware

B
7.8/10
Free tierFrom $0
Hybrid Mamba-Transformer architecture dr...Nemotron 3 Super activates only 3.6B par...
Updated 2026-04-17
MiniMax M2 / M2.5 logo

MiniMax M2 / M2.5

MiniMax's open-weights frontier -- first open model to match Claude Opus 4.6 on SWE-Bench at 10-20× lower cost

A
8.4/10
Free tierFrom $0
First open-weight model to hit 80.2% on ...~10B active params during inference (out...
Updated 2026-04-13
Falcon (TII) logo

Falcon (TII)

UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware

B
7.1/10
Free tierFrom $0
Apache 2.0 license -- fully permissive f...Sub-10B sizes run on any consumer GPU or...
Updated 2026-04-13

gpt-oss (OpenAI)

OpenAI's FIRST open-weight models -- gpt-oss-120b (single 80GB GPU, near parity with o4-mini on reasoning) and gpt-oss-20b (runs on 16GB edge devices). Apache 2.0. Launched 2025-08-05. gpt-oss-safeguard ships in 2026 as the safety-tuned variant

A
8.1/10
Free tierFrom $0
First-ever OpenAI open-weight release --...gpt-oss-120b approaches o4-mini on reaso...
Updated 2026-04-17

Arcee Trinity-Large-Thinking

Arcee AI's US-made open-weight frontier reasoning model -- launched 2026-04-01. 398B total params, ~13B active. Sparse MoE (256 experts, 4 active = 1.56% routing). Apache 2.0, trained from scratch. #2 on PinchBench trailing only Claude 3.5 Opus. ~96% cheaper than Opus-4.6 on agentic tasks

A
8.1/10
Free tierFrom $0
Rare US-made frontier-tier open-weight r...Trained from scratch (not a fine-tune) a...
Updated 2026-04-17

Olmo 3 (AI2)

Allen Institute for AI's fully-open frontier reasoning models -- Olmo 3 family (2025-11-20) includes 7B and 32B sizes, four variants (Base, Think, Instruct, RLZero). Apache 2.0 with fully open data + checkpoints + training logs. Olmo 3-Think 32B matches Qwen3-32B-Thinking at 6x fewer training tokens

B
7.9/10
Free tierFrom $0
FULLY OPEN is a different category than ...Olmo 3-Think 32B matches Qwen3-32B-Think...
Updated 2026-04-17

AI21 Jamba2

AI21 Labs' hybrid SSM-Transformer (Mamba-style) open-weight family -- Jamba2 launched 2026-01-08. Two sizes: 3B dense (runs on phones / laptops) and Jamba2 Mini MoE (12B active / 52B total). Apache 2.0, 256K context, mid-trained on 500B tokens

A
8.0/10
Free tierFrom $0
Hybrid SSM-Transformer (Mamba-style) arc...Jamba2 3B dense runs realistically on iP...
Updated 2026-04-17

StepFun Step 3.5 Flash

StepFun's (China) agent-focused open-weight model -- Step 3.5 Flash launched 2026-02-01. 196B sparse MoE, ~11B active. Benchmarks slightly ahead of DeepSeek V3.2 at over 3x smaller total size. Step 3 (321B / 38B active, Apache 2.0) and Step3-VL-10B multimodal also in the family

B
7.8/10
Free tierFrom $0
Step 3.5 Flash at 196B total / 11B activ...Agent-focused tuning explicitly -- tool ...
Updated 2026-04-17

Cohere Command A

Cohere's enterprise-multilingual flagship -- 111B params, 256K context, runs on 2x H100. 23 languages. CC-BY-NC 4.0 on weights (research / non-commercial), commercial requires Cohere enterprise contract. Follow-ups: Command A Reasoning + Command A Vision

B
7.5/10
Free tierFrom $0
Best-in-class multilingual open-weight m...Runs on just 2x H100 at FP16 for the ful...
Updated 2026-04-17