AI Aggregator

Models  /  Nemotron

Nemotron 3 Nano 30B-A3B

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

agentsreasoningon-devicegeneral-chatgpu-16gbgpu-24gbgpu-48gbapple-silicon-32gbapple-silicon-64gb
Parameters
3.5B
Family
Nemotron
License
NVIDIA Nemotron Open Model License
Context length
262,144 tokens
Languages
en, es, fr, de, it, ja, multi
Modalities
text
Released
2025-12-15
HF downloads (30d)
1,325,355
Stats updated
0 days ago

Strengths

Hybrid Mamba2-Transformer-MoE: 3.5B active out of 30B total, 256K default context (1M max). Trained from scratch on 25T tokens. Strong agentic and tool-calling post-training.

Weaknesses

NVIDIA Nemotron Open Model License (permissive but not Apache / MIT - downstream redistribution must pass it through). Hybrid architecture support is uneven across inference stacks.

Nemotron 3 Nano is NVIDIA's small open-weight entry in the v3 family: 30B total, 3.5B active, hybrid Mamba-2 / Transformer / MoE architecture trained from scratch. The architecture is the editorial story - 23 Mamba-2 layers and 23 MoE layers interleaved with six attention layers, which trades some quality-per-parameter for substantially better long-context throughput than a pure transformer.

Post-training emphasized agentic flows: tool calling, instruction following, code, and math, with the 3.5B active count keeping latency low even on consumer-grade GPUs.

When to pick it

  • Long-context agent loops where Mamba's linear attention pays for itself.
  • 16-24GB hardware where MoE active-param speed matters more than total VRAM (Q4_K_XL fits 16GB; partial expert offload extends further).
  • You can accept the NVIDIA Nemotron Open Model License terms.

When to skip it

  • Strict Apache / MIT requirement - reach for Qwen3.5-9B or Gemma 4 instead.
  • Vision or multimodal tasks - Nano is text-only; pick the Nemotron 3 Nano Omni variant or a different family.
  • Inference stack hasn't shipped Mamba-2 hybrid support yet.