Styled Text Image Generation with Eruku on AMD

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Eruku is a styled text image generation model, presented at WACV 2026, that addresses the challenge of generating readable and controllable text images while faithfully matching a target visual style. It utilizes an autoregressive encoder-decoder Transformer architecture, contrasting with diffusion models, and employs an explicit end-of-generation token for improved reliability. The model was trained on a 20 million sample synthetic dataset and fine-tuned on a smaller dataset of longer samples, demonstrating competitive performance against predecessors like Emuru and diffusion-based methods like DiffusionPen. Eruku was developed and trained on the LUMI supercomputer using AMD Instinct™ MI250X GPUs and ROCm™, with its code compatible with both CUDA and ROCm™ for flexible development.

Key takeaway

For AI Engineers and ML Scientists working on text image generation, Eruku offers a robust autoregressive solution for high-fidelity styled text. You should consider deploying Eruku on AMD Instinct™ MI250X GPUs with ROCm™ for scalable training and inference, especially for tasks requiring precise control over text content and visual style, such as synthetic document creation or handwriting synthesis.

Key insights

Eruku is an autoregressive Transformer for styled text image generation, outperforming diffusion models in text control and style fidelity.

Principles

Method

Eruku tokenizes text with ByT5 and processes style images via Emuru's VAE encoder. An encoder-decoder Transformer then generates image tokens autoregressively, stopping upon an end-of-generation token, which are then decoded by the VAE.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.