Best small LLMs for structured extraction

Pull dates and amounts from invoices, normalize resumes, tag support tickets. Quietly one of the largest LLM workloads in production, and one where small models genuinely deliver.

The relevant differentiators aren't general-knowledge benchmarks. They're schema discipline, JSON-mode reliability under load, and graceful behavior when a field isn't in the source.

What we look for

JSON integrity - valid output 100% of the time, even on adversarial inputs.
Schema adherence - no invented fields, correct types.
Null handling - emit null, don't invent.
Throughput - extraction is high-volume; 2x faster usually beats 5% more accurate.
Constrained-decoding compatibility (outlines, lm-format-enforcer).

Ranked for high-volume extraction pipelines.

Picks

#1 Mistral Small 3.2 24B 24.0B · Apache 2.0

Apache 2.0 mid-size all-rounder. ~81% MMLU at 150 t/s, 3x faster than Llama 3.3 70B at similar quality. 128K context. Vision support added in 3.x line.

#2 Qwen3-8B Instruct 8.2B · Apache 2.0

Strong all-rounder in the 7-8B class. Apache 2.0. 32K native context, 131K with YaRN. Hybrid 'thinking' mode you can toggle per request.

#3 Qwen2.5-VL 7B Instruct 7.6B · Apache 2.0

Vision-language specialist at 7B. Beats Llama 3.2-Vision 11B on MMMU (58.6), MathVista (68.2), DocVQA (95.7). Apache 2.0. Variable resolution and aspect ratio support, video frames.

#4 Llama 3.1 8B Instruct 8.0B · Llama 3.1 Community

The ecosystem baseline. Largest community of fine-tunes, quantizations, and inference-engine support of any open small model. Predictable in production.