AI Aggregator

Models  /  Qwen

Qwen3-8B Instruct

Qwen/Qwen3-8B

general-chatcodingreasoningragagentsmultilingualextractiongpu-8gbgpu-16gbgpu-24gbgpu-48gbapple-silicon-16gbapple-silicon-32gbcpu-16gbcpu-32gbdatacenter
Parameters
8.2B
Family
Qwen
License
Apache 2.0
Context length
131,072 tokens
Languages
en, zh, multi
Modalities
text
Released
2025-04-29
HF downloads (30d)
10,018,533
Stats updated
-1 days ago

Strengths

Strong all-rounder in the 7-8B class. Apache 2.0. 32K native context, 131K with YaRN. Hybrid 'thinking' mode you can toggle per request.

Weaknesses

Tokenizer favors CJK over English (more tokens per byte for English-only deployments). Safety guardrails feel over-tuned in some domains.

Qwen3-8B is the small-model pick most teams reach for in 2026. It out-benches Llama 3.1 8B and Mistral 7B on essentially every public eval, with a clean Apache 2.0 license.

The hybrid "thinking" mode is the architectural shift worth knowing: at inference you can toggle deeper chain-of-thought per request, trading latency for accuracy. Genuinely useful for agentic flows that occasionally need to plan.

When to pick it

  • One Apache-2.0 small model that handles chat, code, RAG, and light reasoning.
  • Multilingual users (especially CJK).
  • Long context (131K via YaRN) without Llama's licensing complexity.

When to skip it

  • Your inference stack doesn't support the hybrid thinking-mode toggle.
  • English-only and tokens-per-dollar matters: Llama is more efficient on English.