AI Aggregator

Categories

Best small LLMs for general chat

All-purpose models for chatbots and assistants. Instruction-following and license trump benchmarks.

The most crowded category in small-LLM-land. Every major lab ships an instruction-tuned generalist, and at 7-8B the differences come down to tokenizer, license, and ecosystem more than raw capability.

What we look for

  • Instruction-following - does it actually do what you ask?
  • Refusal calibration - over-safe models become useless for benign tasks.
  • Format stability - JSON mode, system-prompt adherence.
  • Ecosystem - quantizations, training recipes, inference engines.

Ranked for an English-speaking developer building a generalist chatbot.

Picks

  1. #1 Gemma 4 31B 31.0B · Apache 2.0

    31B dense, Apache 2.0, 256K context, multimodal. AIME 2026 89.2%, Codeforces ELO 2150 - leads open dense models in its size class for math and competitive programming. Bridges 'serious work' and 'fits on a 24-48GB GPU'.

  2. #2 Gemma 4 E4B 4.0B · Apache 2.0

    Native multimodal (text, image, video, audio) at edge sizes. Apache 2.0. ~4B effective inference footprint built to preserve RAM and battery on consumer devices.

  3. #3 Mistral Small 3.2 24B 24.0B · Apache 2.0

    Apache 2.0 mid-size all-rounder. ~81% MMLU at 150 t/s, 3x faster than Llama 3.3 70B at similar quality. 128K context. Vision support added in 3.x line.

  4. #4 Phi-4 Reasoning 14B 14.0B · MIT

    Punches above its weight on reasoning. Beats DeepSeek-R1-Distill-Llama-70B on AIME and GPQA at 5x smaller. Comparable to full DeepSeek-R1 (671B) on AIME 2025. MIT license.

  5. #5 Qwen3-8B Instruct 8.2B · Apache 2.0

    Strong all-rounder in the 7-8B class. Apache 2.0. 32K native context, 131K with YaRN. Hybrid 'thinking' mode you can toggle per request.

  6. #6 Phi-4-mini 3.8B 3.8B · MIT

    MIT license, 67% MMLU at 3.8B. Inherits the Phi reasoning lineage in a small footprint. 128K context, 200K-token vocabulary for multilingual support. Function-calling support.

  7. #7 Qwen2.5-VL 7B Instruct 7.6B · Apache 2.0

    Vision-language specialist at 7B. Beats Llama 3.2-Vision 11B on MMMU (58.6), MathVista (68.2), DocVQA (95.7). Apache 2.0. Variable resolution and aspect ratio support, video frames.

  8. #8 Llama 3.2 3B Instruct 3.2B · Llama 3.2 Community

    Meta's mobile-targeted small model. Largest ecosystem at this size class. 128K context. Solid baseline for on-device assistants where ecosystem maturity matters.

  9. #9 Llama 3.1 8B Instruct 8.0B · Llama 3.1 Community

    The ecosystem baseline. Largest community of fine-tunes, quantizations, and inference-engine support of any open small model. Predictable in production.