gpt-oss-20b

gpt-oss-20b is OpenAI's first open-weight model since GPT-2, shipped under Apache 2.0 in August 2025. The pitch is on-device reasoning: 21B total parameters, 3.6B active per token via MoE, MXFP4 quantization built into the release so the whole thing runs in 16GB of unified memory or VRAM.

The configurable reasoning levels (low / medium / high) trade latency for accuracy at request time, similar to Qwen3's thinking toggle. On reasoning evals it lands close to o3-mini, which is a strong claim for a model that boots on a laptop.

When to pick it

On-device reasoning where 16GB is the hard ceiling.
You want OpenAI provenance under a permissive license.
Apps already wired for Harmony or OpenAI-style response formats.

When to skip it

Pure benchmark chasing - Qwen3.5-9B out-reasons it at smaller size.
Vision, multimodal, or non-English-first workloads. Text only, English-leaning.
Inference stack doesn't support MXFP4 or Harmony format.

Strengths

Weaknesses

When to pick it

When to skip it