Styled Text Image Generation with Eruku on AMD
Summary
Eruku is a styled text image generation model, presented at WACV 2026, that addresses the challenge of generating readable and controllable text images while faithfully matching a target visual style. It utilizes an autoregressive encoder-decoder Transformer architecture, contrasting with diffusion models, and employs an explicit end-of-generation token for improved reliability. The model was trained on a 20 million sample synthetic dataset and fine-tuned on a smaller dataset of longer samples, demonstrating competitive performance against predecessors like Emuru and diffusion-based methods like DiffusionPen. Eruku was developed and trained on the LUMI supercomputer using AMD Instinct™ MI250X GPUs and ROCm™, with its code compatible with both CUDA and ROCm™ for flexible development.
Key takeaway
For AI Engineers and ML Scientists working on text image generation, Eruku offers a robust autoregressive solution for high-fidelity styled text. You should consider deploying Eruku on AMD Instinct™ MI250X GPUs with ROCm™ for scalable training and inference, especially for tasks requiring precise control over text content and visual style, such as synthetic document creation or handwriting synthesis.
Key insights
Eruku is an autoregressive Transformer for styled text image generation, outperforming diffusion models in text control and style fidelity.
Principles
- Autoregressive models excel in variable-length text image generation.
- Explicit end-of-generation tokens enhance content adherence.
- Synthetic datasets can effectively train robust text generation models.
Method
Eruku tokenizes text with ByT5 and processes style images via Emuru's VAE encoder. An encoder-decoder Transformer then generates image tokens autoregressively, stopping upon an end-of-generation token, which are then decoded by the VAE.
In practice
- Use Eruku for synthetic handwriting generation.
- Apply Eruku for graphic design text styling.
- Integrate Eruku for OCR training data creation.
Topics
- Eruku
- Styled Text Image Generation
- AMD Instinct MI250X
- ROCm
- LUMI Supercomputer
Code references
- Blowing-Up-Groundhogs/Eruku
- Lumi-supercomputer/LUMI-AI-Guide
- Blowing-Up-Groundhogs/Eruku
- huggingface/accelerate
- spaces/carminezacc
Best for: Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.