Fine-tuning Llama 3.1 8B

If you're new to fine-tuning a small open model, start here. Llama 3.1 8B has the biggest community of training recipes, the best inference support, and the most quantization formats. You'll find a working LoRA recipe for almost any domain.

Recommended training stacks

Unsloth - fastest single-GPU LoRA / QLoRA. Fits a 24GB GPU comfortably for 8B at 4K-8K context.
Axolotl - most flexible. YAML config, supports SFT/DPO/ORPO/KTO in one pipeline.
HuggingFace TRL - clean Python API for custom training loops.

Watch out for

Special tokens - <|begin_of_text|>, <|eot_id|> etc. must be preserved. Most frameworks handle this; check if you're rolling your own dataloader.
Acceptable-use policy - Meta's AUP excludes some applications even with commercial use. Check before fine-tuning a use-case-specific variant.
Chat template strictness - drift from the official template hurts eval scores noticeably.