Models / Phi

Phi-4 Reasoning 14B

microsoft/Phi-4-reasoning

reasoning coding general-chatgpu-16gbgpu-24gbgpu-48gbapple-silicon-16gbapple-silicon-32gbcpu-32gbdatacenter

Parameters: 14.0B
Family: Phi
License: MIT
Context length: 32,768 tokens
Languages: en
Modalities: text
Released: 2025-04-30
HF downloads (30d): 17,439
Stats updated: -1 days ago

Strengths

Punches above its weight on reasoning. Beats DeepSeek-R1-Distill-Llama-70B on AIME and GPQA at 5x smaller. Comparable to full DeepSeek-R1 (671B) on AIME 2025. MIT license.

Weaknesses

English-only. 32K context. Not a generalist - refusal calibration and conversational warmth lag general-purpose tunes.

Phi-4 Reasoning is the proof-by-counterexample for "you need a big model to reason well." At 14B, it consistently matches or beats reasoning specialists 5x its size on math, logic, and code-reasoning benchmarks. Microsoft trained it specifically with reasoning traces, and it shows.

It is not a generalist. For chat or conversational products, pick something else. For reasoning-heavy backends - solver agents, math tutors, code-review assistants - hard to beat at this size.

When to pick it

Product needs to reason through hard problems: math, logic, planning, debugging.
MIT license, fits on a 16GB GPU quantized.
You're willing to layer a generalist for casual chat.

When to skip it

One model for both casual chat and reasoning: Qwen3-8B's hybrid thinking mode covers more ground.
Context needs exceed 32K.