Fine-tuning Qwen3-8B - AI Aggregator

The strongest small base for new fine-tuning projects in 2026. Apache 2.0 removes legal friction, the base model's quality means you start from a higher ceiling than Llama 3.1, and the hybrid-thinking architecture is uniquely fine-tunable.

Recommended training stacks

Axolotl - tested Qwen3 configs upstream; supports the hybrid thinking-mode toggle in training.
Unsloth - Qwen3 LoRA support landed in late 2025; matches Llama LoRA throughput.
HuggingFace TRL - tokenizer "just works" via the model card.

Watch out for

Tokenizer overhead for English - 151K vocab favors CJK; English-only data produces ~5-10% more tokens than Llama. Plan dataset budgets accordingly.
Thinking-mode prompts - if training data lacks <think>...</think> traces, the fine-tune may collapse the thinking ability. Either include traces or disable thinking-mode in training.
Heavier safety tuning than Llama. Harder to elicit refusals via fine-tuning if your domain has legitimate need (medical, security research). Plan eval accordingly.