AI Aggregator

Models  /  Mistral

Mistral Small 3.2 24B

mistralai/Mistral-Small-3.2-24B-Instruct-2506

general-chatcodingragagentsmultilingualvisionextractiongpu-16gbgpu-24gbgpu-48gbapple-silicon-32gbapple-silicon-64gb
Parameters
24.0B
Family
Mistral
License
Apache 2.0
Context length
131,072 tokens
Languages
en, multi
Modalities
text, image
Released
2025-06-20
HF downloads (30d)
475,757
Stats updated
0 days ago

Strengths

Apache 2.0 mid-size all-rounder. ~81% MMLU and 3x faster than Llama 3.3 70B at similar quality. 128K context. Vision support added in 3.x line.

Weaknesses

24B at q4 wants 16GB GPU minimum and is tight at long context. CPU inference is too slow at this size to be practical.

Mistral Small 3.2 is the fast Apache-licensed mid-size in your stack. The "Small 3" line was Mistral's bet on latency-optimized 24B models that could replace 70B-class generalists at a fraction of the inference cost. The 3.x updates added 128K context and vision understanding while keeping the speed.

For products that need one model to handle chat, RAG, structured extraction, and light vision tasks behind a single API, Small 3.2 is the open-weight default in 2026.

When to pick it

  • One Apache-2.0 model for general-purpose serving where Qwen3-8B isn't enough but Gemma 4 31B is more than you want to pay for.
  • Latency-sensitive deployments on a 24GB consumer GPU - meaningfully faster than 70B alternatives at similar quality.
  • Vision is a nice-to-have, not the headline feature.

When to skip it

  • Hardware floor is 8GB or you need on-device. Use Llama 3.2 3B or Gemma 4 E4B.
  • You want bleeding-edge benchmarks. Qwen 3.5+ is ahead on most evals; Mistral leans on speed and license clarity.