AI Aggregator

Models  /  gpt-oss

gpt-oss-20b

openai/gpt-oss-20b

reasoningon-devicegeneral-chatagentscodinggpu-8gbgpu-16gbgpu-24gbgpu-48gbapple-silicon-16gbapple-silicon-32gbapple-silicon-64gbcpu-16gbcpu-32gb
Parameters
3.6B
Family
gpt-oss
License
Apache 2.0
Context length
131,072 tokens
Languages
en, multi
Modalities
text
Released
2025-08-08
HF downloads (30d)
6,981,799
Stats updated
0 days ago

Strengths

OpenAI's small open-weight model. 21B total / 3.6B active MoE, runs in 16GB at MXFP4. Configurable reasoning effort (low/medium/high). Matches o3-mini on common reasoning evals.

Weaknesses

Older than the 2026 Qwen wave - Qwen3.5-9B beats it on most reasoning benchmarks despite being dense. Requires the Harmony response format, which not every inference stack handles cleanly.

gpt-oss-20b is OpenAI's first open-weight model since GPT-2, shipped under Apache 2.0 in August 2025. The pitch is on-device reasoning: 21B total parameters, 3.6B active per token via MoE, MXFP4 quantization built into the release so the whole thing runs in 16GB of unified memory or VRAM.

The configurable reasoning levels (low / medium / high) trade latency for accuracy at request time, similar to Qwen3's thinking toggle. On reasoning evals it lands close to o3-mini, which is a strong claim for a model that boots on a laptop.

When to pick it

  • On-device reasoning where 16GB is the hard ceiling.
  • You want OpenAI provenance under a permissive license.
  • Apps already wired for Harmony or OpenAI-style response formats.

When to skip it

  • Pure benchmark chasing - Qwen3.5-9B out-reasons it at smaller size.
  • Vision, multimodal, or non-English-first workloads. Text only, English-leaning.
  • Inference stack doesn't support MXFP4 or Harmony format.