Issue · May 2026
Small open LLMs, ranked by what you'll actually do with them.
Curated picks for chat, code, agents, RAG, vision, and on-device. Plus fine-tuning notes.
By task
Best small models, by task
-
General chat 13 ranked
All-purpose models for chatbots and assistants. Instruction-following and license trump benchmarks.
Top pick · Qwen3.6-27B
-
Coding 9 ranked
Code completion, generation, and review. Small specialists now beat generalist 70Bs.
Top pick · Qwen3-Coder-Next
-
Reasoning 8 ranked
Models that think before answering. Small specialists nearly match frontier-scale on math and logic.
Top pick · Phi-4 Reasoning 14B
-
RAG / long context 9 ranked
Models that hold up when stuffed with retrieved context. Production's dominant small-LLM use case.
Top pick · Qwen3.6-27B
-
Agents & function calling 9 ranked
Models that emit clean tool calls and recover from errors gracefully.
Top pick · Qwen3-Coder-Next
-
On-device / mobile 5 ranked
Small enough to run on a phone, laptop, or embedded device. The 1-4B effective tier is interesting.
Top pick · Phi-4-mini 3.8B
-
Vision-language 6 ranked
Models that take images alongside text. Native multimodal pretraining is the 2026 default.
Top pick · Qwen2.5-VL 7B Instruct
-
Multilingual 6 ranked
Models that work outside English without falling off a cliff. Tokenizer choice matters most.
Top pick · Gemma 4 31B
-
Structured extraction 5 ranked
Models that turn messy text into clean JSON. Half of production LLM workloads run on this.
Top pick · Qwen3.5-9B
Tracked
Recently tracked models
-
Qwen3.6-27B 27.0B · Apache 2.0 · 1.3M/mo coding agents general-chat +3 more
-
Gemma 4 31B 31.0B · Apache 2.0 · 8M/mo general-chat coding reasoning +4 more
-
Gemma 4 E4B 4.0B · Apache 2.0 · 5.3M/mo general-chat on-device vision +1 more
-
Qwen3.5-9B 9.0B · Apache 2.0 · 7.5M/mo general-chat coding reasoning +5 more
-
Qwen3-Coder-Next 3.0B · Apache 2.0 · 1M/mo coding agents rag
-
Nemotron 3 Nano 30B-A3B 3.5B · NVIDIA Nemotron Open Model License · 1.3M/mo agents reasoning on-device +1 more
-
gpt-oss-20b 3.6B · Apache 2.0 · 7M/mo reasoning on-device general-chat +2 more
-
Mistral Small 3.2 24B 24.0B · Apache 2.0 · 1.1M/mo general-chat coding rag +4 more
-
Phi-4 Reasoning 14B 14.0B · MIT · 17K/mo reasoning coding general-chat
-
Qwen3-8B Instruct 8.2B · Apache 2.0 · 10.2M/mo general-chat coding reasoning +4 more
-
Phi-4-mini 3.8B 3.8B · MIT · 1.6M/mo general-chat on-device reasoning +1 more
-
Qwen2.5-VL 7B Instruct 7.6B · Apache 2.0 · 8.9M/mo general-chat vision multilingual +2 more
-
Llama 3.2 3B Instruct 3.2B · Llama 3.2 Community · 2.2M/mo general-chat on-device rag
-
Llama 3.1 8B Instruct 8.0B · Llama 3.1 Community · 9.5M/mo general-chat rag agents +1 more
Stats refreshed 40m ago from Hugging Face.