Index
Categories
Small open LLMs grouped by the tasks developers actually use them for.
-
General chat 9 ranked
All-purpose models for chatbots and assistants. Instruction-following and license trump benchmarks.
Top pick · Gemma 4 31B
-
Coding 6 ranked
Code completion, generation, and review. Small specialists now beat generalist 70Bs.
Top pick · Qwen3-Coder-Next
-
Reasoning 4 ranked
Models that think before answering. Small specialists nearly match frontier-scale on math and logic.
Top pick · Phi-4 Reasoning 14B
-
RAG / long context 7 ranked
Models that hold up when stuffed with retrieved context. Production's dominant small-LLM use case.
Top pick · Gemma 4 31B
-
Agents & function calling 5 ranked
Models that emit clean tool calls and recover from errors gracefully.
Top pick · Qwen3-Coder-Next
-
On-device / mobile 3 ranked
Small enough to run on a phone, laptop, or embedded device. The 1-4B effective tier is interesting.
Top pick · Phi-4-mini 3.8B
-
Vision-language 4 ranked
Models that take images alongside text. Native multimodal pretraining is the 2026 default.
Top pick · Qwen2.5-VL 7B Instruct
-
Multilingual 5 ranked
Models that work outside English without falling off a cliff. Tokenizer choice matters most.
Top pick · Gemma 4 31B
-
Structured extraction 4 ranked
Models that turn messy text into clean JSON. Half of production LLM workloads run on this.
Top pick · Mistral Small 3.2 24B