AI Aggregator

Models  /  Qwen

Qwen3.5-9B

Qwen/Qwen3.5-9B

general-chatcodingreasoningragagentsmultilingualvisionextractiongpu-8gbgpu-16gbgpu-24gbgpu-48gbapple-silicon-16gbapple-silicon-32gbapple-silicon-64gbcpu-16gbcpu-32gb
Parameters
9.0B
Family
Qwen
License
Apache 2.0
Context length
262,144 tokens
Languages
en, zh, multi
Modalities
text, image, video
Released
2026-03-02
HF downloads (30d)
7,530,112
Stats updated
0 days ago

Strengths

Native multimodal at the 9B mark. 262K context (1M with YaRN). Apache 2.0. Early-fusion training rolls vision into the base model rather than bolting on a separate encoder.

Weaknesses

Hybrid Gated-Delta + sparse-MoE architecture is new enough that some inference stacks lag behind. Tokenizer still favors CJK over English.

Qwen3.5-9B is the small-model upgrade that quietly absorbed the VL line. Where Qwen3 needed a separate Qwen2.5-VL for vision, the 3.5 series trains text and image tokens jointly from scratch, so a single 9B checkpoint covers chat, code, RAG, multilingual, and image / short-video understanding.

The numbers tracked elsewhere are eye-catching ("9B beats 120B on certain tasks"), but the practical takeaway is simpler: this is the new default in the 7-9B class, with a longer native context (262K) than anything else at this size and a permissive license.

When to pick it

  • One Apache-2.0 model that handles chat, RAG, coding, and image understanding.
  • Long-context workloads that want native 262K without YaRN scaling.
  • Multilingual deployments - 200+ languages in training.

When to skip it

  • Inference stack doesn't yet support hybrid Gated Delta / sparse MoE attention. Check llama.cpp / vLLM compatibility first.
  • Vision quality is "useful," not "frontier" - reach for Gemma 4 31B or Qwen3.6-27B for serious VL workloads.