AI Aggregator

Fine-tuning launchpad

Fine-tuning Qwen3-8B

Base on HF: Qwen/Qwen3-8B-Base  ·  Model page →

Tokenizer
Qwen tiktoken (151K vocab, multilingual-leaning)
License
Apache 2.0. No AUP, no MAU caps. Cleanest license in the small-model space.
Ecosystem
Strong and growing. First-class in TRL, Axolotl, Unsloth, llama.cpp, vLLM, SGLang.

The strongest small base for new fine-tuning projects in 2026. Apache 2.0 removes legal friction, the base model's quality means you start from a higher ceiling than Llama 3.1, and the hybrid-thinking architecture is uniquely fine-tunable.

Recommended training stacks

  • Axolotl - tested Qwen3 configs upstream; supports the hybrid thinking-mode toggle in training.
  • Unsloth - Qwen3 LoRA support landed in late 2025; matches Llama LoRA throughput.
  • HuggingFace TRL - tokenizer "just works" via the model card.

Watch out for

  • Tokenizer overhead for English - 151K vocab favors CJK; English-only data produces ~5-10% more tokens than Llama. Plan dataset budgets accordingly.
  • Thinking-mode prompts - if training data lacks <think>...</think> traces, the fine-tune may collapse the thinking ability. Either include traces or disable thinking-mode in training.
  • Heavier safety tuning than Llama. Harder to elicit refusals via fine-tuning if your domain has legitimate need (medical, security research). Plan eval accordingly.