AI Aggregator

Fine-tuning launchpad

Fine-tuning Gemma 4 E4B

Base on HF: google/gemma-4-e4b-pt  ·  Model page →

Tokenizer
Gemma SentencePiece (256K vocab, multimodal-aware)
License
Apache 2.0. Cleaner than the older Gemma terms - same effective freedoms as Qwen and Mistral.
Ecosystem
Newer (April 2026). LoRA support via TRL is solid; community fine-tunes are catching up. Best for on-device or multimodal fine-tunes.

Gemma 4 E4B is the right base when you need a small fine-tuned model that handles multimodal input. The ~4B effective footprint means LoRA adapters fit on consumer GPUs with room to spare, and Apache 2.0 removes the friction earlier Gemma releases had.

Recommended training stacks

  • HuggingFace TRL with PEFT - canonical multimodal-aware path. Use gemma-4-e4b-pt as the base for vision/audio fine-tunes.
  • Unsloth - text-only Gemma 4 LoRA tested upstream; vision pathways still maturing.

Watch out for

  • Multimodal token interleaving - image, audio, and text tokens follow a specific pattern. Deviating produces silent quality loss.
  • Effective-parameter accounting - LoRA against the inference profile is similar to a 4B; full SFT is heavier than the name suggests.
  • Always eval on the quantized format you'll ship. Quantization-induced regressions on small multimodal models can be sharper than text-only.