Gemma 4 Fine-Tuning on Single GPU | Training Gemma 4 With Hugging Face on Custom Dataset (🔴 Live)

2026-04-14 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, extended

Summary

This content details a successful fine-tuning process for the Gemma 4 (2B parameter) model on a custom dataset, addressing issues encountered in a previous attempt. The author demonstrates replacing the `unswot` library with Hugging Face's `trl` library, specifically using `sft trainer`, and following Google's recommended pipeline. The fine-tuning was performed on a machine with 24 GB of VRAM, utilizing a custom dataset of 1,000 examples (900 for training, 100 for testing) structured as Slack messages requiring JSON output for priority and on-call page status. The process involved data preparation into a ShareGPT-like format, LoRA adapter configuration targeting all linear modules, and training for four epochs. Post-training evaluation showed a 99% accuracy on the test set, with the model successfully generating valid JSON outputs, a significant improvement over the untrained base model. The content also covers model saving, merging the LoRA adapter with the base model, and a brief discussion on quantization for inference optimization.

Key takeaway

For AI Engineers and ML Students looking to fine-tune Gemma 4 for structured output tasks, prioritize using Hugging Face's `trl` library with `sft trainer` and LoRA adapters. Ensure your custom dataset is well-formatted (e.g., ShareGPT-like) and sufficiently large (at least 1,000 examples) with balanced representation across categories. This approach can yield high accuracy and enable deployment on GPUs with 24GB VRAM, even for models with 5 billion parameters.

Key insights

Fine-tuning Gemma 4 with Hugging Face's `trl` library and LoRA adapters significantly improves structured JSON output accuracy.

Principles

Use `trl`'s `sft trainer` for Gemma 4 fine-tuning.
LoRA adapters enable efficient training on smaller GPUs.
A balanced dataset improves model performance.

Method

The method involves preparing a custom dataset into a ShareGPT-like format, configuring LoRA adapters targeting all linear modules, and training the Gemma 4 base model using Hugging Face's `sft trainer` for four epochs on a 24GB VRAM GPU.

In practice

Target all linear modules for LoRA adapters.
Aim for at least 1,000 balanced examples for fine-tuning.
Merge LoRA adapter with base model for deployment.

Topics

Gemma 4 Fine-tuning
Hugging Face TRL Library
LoRA Adapters
Single GPU Training
Custom Dataset

Best for: Machine Learning Engineer, AI Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.