Models / Mistral
Mistral Small 3.2 24B
mistralai/Mistral-Small-3.2-24B-Instruct-2506
Strengths
Apache 2.0 mid-size all-rounder. ~81% MMLU at 150 t/s, 3x faster than Llama 3.3 70B at similar quality. 128K context. Vision support added in 3.x line.
Weaknesses
24B at q4 wants 16GB GPU minimum and is tight at long context. CPU inference is too slow at this size to be practical.
Mistral Small 3.2 is the fast Apache-licensed mid-size in your stack. The "Small 3" line was Mistral's bet on latency-optimized 24B models that could replace 70B-class generalists at a fraction of the inference cost. The 3.x updates added 128K context and vision understanding while keeping the speed.
For products that need one model to handle chat, RAG, structured extraction, and light vision tasks behind a single API, Small 3.2 is the open-weight default in 2026.
When to pick it
- One Apache-2.0 model for general-purpose serving where Qwen3-8B isn't enough but Gemma 4 31B is more than you want to pay for.
- Latency-sensitive deployments. ~150 t/s on a single H100, much faster than 70B alternatives.
- Vision is a nice-to-have, not the headline feature.
When to skip it
- Hardware floor is 8GB or you need on-device. Use Llama 3.2 3B or Gemma 4 E4B.
- You want bleeding-edge benchmarks. Qwen 3.5+ is ahead on most evals; Mistral leans on speed and license clarity.