Models / Mistral

Mistral Small 3.2 24B

mistralai/Mistral-Small-3.2-24B-Instruct-2506

general-chat coding rag agents multilingual vision extractiongpu-16gbgpu-24gbgpu-48gbapple-silicon-32gbdatacenter

Parameters: 24.0B
Family: Mistral
License: Apache 2.0
Context length: 131,072 tokens
Languages: en, multi
Modalities: text, image
Released: 2025-06-20
HF downloads (30d): 1,117,923
Stats updated: -1 days ago

Strengths

Apache 2.0 mid-size all-rounder. ~81% MMLU at 150 t/s, 3x faster than Llama 3.3 70B at similar quality. 128K context. Vision support added in 3.x line.

Weaknesses

24B at q4 wants 16GB GPU minimum and is tight at long context. CPU inference is too slow at this size to be practical.

Mistral Small 3.2 is the fast Apache-licensed mid-size in your stack. The "Small 3" line was Mistral's bet on latency-optimized 24B models that could replace 70B-class generalists at a fraction of the inference cost. The 3.x updates added 128K context and vision understanding while keeping the speed.

For products that need one model to handle chat, RAG, structured extraction, and light vision tasks behind a single API, Small 3.2 is the open-weight default in 2026.

When to pick it

One Apache-2.0 model for general-purpose serving where Qwen3-8B isn't enough but Gemma 4 31B is more than you want to pay for.
Latency-sensitive deployments. ~150 t/s on a single H100, much faster than 70B alternatives.
Vision is a nice-to-have, not the headline feature.

When to skip it

Hardware floor is 8GB or you need on-device. Use Llama 3.2 3B or Gemma 4 E4B.
You want bleeding-edge benchmarks. Qwen 3.5+ is ahead on most evals; Mistral leans on speed and license clarity.