Models / Nemotron
Nemotron 3 Nano 30B-A3B
Strengths
Hybrid Mamba2-Transformer-MoE: 3.5B active out of 30B total, 256K default context (1M max). Trained from scratch on 25T tokens. Strong agentic and tool-calling post-training.
Weaknesses
NVIDIA Nemotron Open Model License (permissive but not Apache / MIT - downstream redistribution must pass it through). Hybrid architecture support is uneven across inference stacks.
Nemotron 3 Nano is NVIDIA's small open-weight entry in the v3 family: 30B total, 3.5B active, hybrid Mamba-2 / Transformer / MoE architecture trained from scratch. The architecture is the editorial story - 23 Mamba-2 layers and 23 MoE layers interleaved with six attention layers, which trades some quality-per-parameter for substantially better long-context throughput than a pure transformer.
Post-training emphasized agentic flows: tool calling, instruction following, code, and math, with the 3.5B active count keeping latency low even on consumer-grade GPUs.
When to pick it
- Long-context agent loops where Mamba's linear attention pays for itself.
- 16-24GB hardware where MoE active-param speed matters more than total VRAM (Q4_K_XL fits 16GB; partial expert offload extends further).
- You can accept the NVIDIA Nemotron Open Model License terms.
When to skip it
- Strict Apache / MIT requirement - reach for Qwen3.5-9B or Gemma 4 instead.
- Vision or multimodal tasks - Nano is text-only; pick the Nemotron 3 Nano Omni variant or a different family.
- Inference stack hasn't shipped Mamba-2 hybrid support yet.