AI Aggregator

Categories

Best small LLMs for on-device / mobile

Small enough to run on a phone, laptop, or embedded device. The 1-4B effective tier is interesting.

On-device has different constraints than server inference: RAM matters more than throughput, battery as much as latency, and running out of memory on a phone is a much worse failure mode than a server 503.

The 2026 wave of "effective parameter" models (Gemma 4 E2B/E4B, smaller Qwen3 variants) trades training complexity for footprints that fit consumer hardware. Native multimodal at this size is genuinely new.

What we look for

  • Quantized quality at Q4_K_M / Q5_K_M, not bf16. If it collapses below int8, it's not on-device.
  • Cold-start time on Apple Silicon and Snapdragon.
  • Memory ceiling - total RAM needed, not just weight size.
  • License clarity for redistribution when shipping weights inside an app.
  • Multimodal feasibility - useful screenshots / photos / short audio, or text-only?

Ranked for shipping inference into a mobile app or edge device.

Picks

  1. #1 Gemma 4 E4B 4.0B · Apache 2.0

    Native multimodal (text, image, video, audio) at edge sizes. Apache 2.0. ~4B effective inference footprint built to preserve RAM and battery on consumer devices.

  2. #2 Phi-4-mini 3.8B 3.8B · MIT

    MIT license, 67% MMLU at 3.8B. Inherits the Phi reasoning lineage in a small footprint. 128K context, 200K-token vocabulary for multilingual support. Function-calling support.

  3. #3 Llama 3.2 3B Instruct 3.2B · Llama 3.2 Community

    Meta's mobile-targeted small model. Largest ecosystem at this size class. 128K context. Solid baseline for on-device assistants where ecosystem maturity matters.