AI Aggregator

Categories

Best small LLMs for on-device / mobile

Small enough to run on a phone, laptop, or embedded device. The 1-4B effective tier is interesting.

On-device has different constraints than server inference: RAM matters more than throughput, battery as much as latency, and running out of memory on a phone is a much worse failure mode than a server 503.

The 2026 wave of "effective parameter" models (Gemma 4 E2B/E4B, smaller Qwen3 variants) trades training complexity for footprints that fit consumer hardware. Native multimodal at this size is genuinely new.

What we look for

  • Quantized quality at Q4_K_M / Q5_K_M, not bf16. If it collapses below int8, it's not on-device.
  • Cold-start time on Apple Silicon and Snapdragon.
  • Memory ceiling - total RAM needed, not just weight size.
  • License clarity for redistribution when shipping weights inside an app.
  • Multimodal feasibility - useful screenshots / photos / short audio, or text-only?

Ranked for shipping inference into a mobile app or edge device.

Picks

  1. #1 Gemma 4 E4B 4.0B · Apache 2.0

    Native multimodal (text, image, video, audio) at edge sizes. Apache 2.0. ~4B effective inference footprint built to preserve RAM and battery on consumer devices.

  2. #2 Nemotron 3 Nano 30B-A3B 3.5B · NVIDIA Nemotron Open Model License

    Hybrid Mamba2-Transformer-MoE: 3.5B active out of 30B total, 256K default context (1M max). Trained from scratch on 25T tokens. Strong agentic and tool-calling post-training.

  3. #3 gpt-oss-20b 3.6B · Apache 2.0

    OpenAI's small open-weight model. 21B total / 3.6B active MoE, runs in 16GB at MXFP4. Configurable reasoning effort (low/medium/high). Matches o3-mini on common reasoning evals.

  4. #4 Phi-4-mini 3.8B 3.8B · MIT

    MIT license, 67% MMLU at 3.8B. Inherits the Phi reasoning lineage in a small footprint. 128K context, 200K-token vocabulary for multilingual support. Function-calling support.

  5. #5 Llama 3.2 3B Instruct 3.2B · Llama 3.2 Community

    Meta's mobile-targeted small model. Largest ecosystem at this size class. 128K context. Solid baseline for on-device assistants where ecosystem maturity matters.