Issue · May 2026
Small open LLMs, ranked by what you'll actually do with them.
Curated picks for chat, code, agents, RAG, vision, and on-device. Plus fine-tuning notes.
By task
Best small models, by task
-
General chat 9 ranked
All-purpose models for chatbots and assistants. Instruction-following and license trump benchmarks.
Top pick · Gemma 4 31B
-
Coding 6 ranked
Code completion, generation, and review. Small specialists now beat generalist 70Bs.
Top pick · Qwen3-Coder-Next
-
Reasoning 4 ranked
Models that think before answering. Small specialists nearly match frontier-scale on math and logic.
Top pick · Phi-4 Reasoning 14B
-
RAG / long context 7 ranked
Models that hold up when stuffed with retrieved context. Production's dominant small-LLM use case.
Top pick · Gemma 4 31B
-
Agents & function calling 5 ranked
Models that emit clean tool calls and recover from errors gracefully.
Top pick · Qwen3-Coder-Next
-
On-device / mobile 3 ranked
Small enough to run on a phone, laptop, or embedded device. The 1-4B effective tier is interesting.
Top pick · Phi-4-mini 3.8B
-
Vision-language 4 ranked
Models that take images alongside text. Native multimodal pretraining is the 2026 default.
Top pick · Qwen2.5-VL 7B Instruct
-
Multilingual 5 ranked
Models that work outside English without falling off a cliff. Tokenizer choice matters most.
Top pick · Gemma 4 31B
-
Structured extraction 4 ranked
Models that turn messy text into clean JSON. Half of production LLM workloads run on this.
Top pick · Mistral Small 3.2 24B
Tracked
Recently tracked models
-
Gemma 4 31B 31.0B · Apache 2.0 · 7.9M/mo general-chat coding reasoning +4 more
-
Gemma 4 E4B 4.0B · Apache 2.0 · 5.2M/mo general-chat on-device vision +1 more
-
Qwen3-Coder-Next 3.0B · Apache 2.0 · 1M/mo coding agents rag
-
Mistral Small 3.2 24B 24.0B · Apache 2.0 · 1.1M/mo general-chat coding rag +4 more
-
Phi-4 Reasoning 14B 14.0B · MIT · 17.4K/mo reasoning coding general-chat
-
Qwen3-8B Instruct 8.2B · Apache 2.0 · 10M/mo general-chat coding reasoning +4 more
-
Phi-4-mini 3.8B 3.8B · MIT · 1.6M/mo general-chat on-device reasoning +1 more
-
Qwen2.5-VL 7B Instruct 7.6B · Apache 2.0 · 8.9M/mo general-chat vision multilingual +2 more
-
Llama 3.2 3B Instruct 3.2B · Llama 3.2 Community · 2.1M/mo general-chat on-device rag
-
Llama 3.1 8B Instruct 8.0B · Llama 3.1 Community · 9.6M/mo general-chat rag agents +1 more
Stats refreshed just now from Hugging Face.