Qwen3.5-9B

Qwen3.5-9B is the small-model upgrade that quietly absorbed the VL line. Where Qwen3 needed a separate Qwen2.5-VL for vision, the 3.5 series trains text and image tokens jointly from scratch, so a single 9B checkpoint covers chat, code, RAG, multilingual, and image / short-video understanding.

The numbers tracked elsewhere are eye-catching ("9B beats 120B on certain tasks"), but the practical takeaway is simpler: this is the new default in the 7-9B class, with a longer native context (262K) than anything else at this size and a permissive license.

When to pick it

One Apache-2.0 model that handles chat, RAG, coding, and image understanding.
Long-context workloads that want native 262K without YaRN scaling.
Multilingual deployments - 200+ languages in training.

When to skip it

Inference stack doesn't yet support hybrid Gated Delta / sparse MoE attention. Check llama.cpp / vLLM compatibility first.
Vision quality is "useful," not "frontier" - reach for Gemma 4 31B or Qwen3.6-27B for serious VL workloads.

Strengths

Weaknesses

When to pick it

When to skip it