Fine-tuning Llama 3.1 8B
Base on HF: meta-llama/Llama-3.1-8B · Model page →
If you're new to fine-tuning a small open model, start here. Llama 3.1 8B has the biggest community of training recipes, the best inference support, and the most quantization formats. You'll find a working LoRA recipe for almost any domain.
Recommended training stacks
- Unsloth - fastest single-GPU LoRA / QLoRA. Fits a 24GB GPU comfortably for 8B at 4K-8K context.
- Axolotl - most flexible. YAML config, supports SFT/DPO/ORPO/KTO in one pipeline.
- HuggingFace TRL - clean Python API for custom training loops.
Watch out for
- Special tokens -
<|begin_of_text|>,<|eot_id|>etc. must be preserved. Most frameworks handle this; check if you're rolling your own dataloader. - Acceptable-use policy - Meta's AUP excludes some applications even with commercial use. Check before fine-tuning a use-case-specific variant.
- Chat template strictness - drift from the official template hurts eval scores noticeably.