Models / Qwen

Qwen3-8B Instruct

general-chat coding reasoning rag agents multilingual extractiongpu-8gbgpu-16gbgpu-24gbgpu-48gbapple-silicon-16gbapple-silicon-32gbcpu-16gbcpu-32gbdatacenter

Parameters: 8.2B
Family: Qwen
License: Apache 2.0
Context length: 131,072 tokens
Languages: en, zh, multi
Modalities: text
Released: 2025-04-29
HF downloads (30d): 10,018,533
Stats updated: -1 days ago

Strengths

Strong all-rounder in the 7-8B class. Apache 2.0. 32K native context, 131K with YaRN. Hybrid 'thinking' mode you can toggle per request.

Weaknesses

Tokenizer favors CJK over English (more tokens per byte for English-only deployments). Safety guardrails feel over-tuned in some domains.

Qwen3-8B is the small-model pick most teams reach for in 2026. It out-benches Llama 3.1 8B and Mistral 7B on essentially every public eval, with a clean Apache 2.0 license.

The hybrid "thinking" mode is the architectural shift worth knowing: at inference you can toggle deeper chain-of-thought per request, trading latency for accuracy. Genuinely useful for agentic flows that occasionally need to plan.

When to pick it

One Apache-2.0 small model that handles chat, code, RAG, and light reasoning.
Multilingual users (especially CJK).
Long context (131K via YaRN) without Llama's licensing complexity.

When to skip it

Your inference stack doesn't support the hybrid thinking-mode toggle.
English-only and tokens-per-dollar matters: Llama is more efficient on English.