Styled Text Image Generation with Eruku on AMD

2026-04-24 · Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Eruku is a styled text image generation model, presented at WACV 2026, that addresses the challenge of generating readable and controllable text images while faithfully matching a target visual style. It utilizes an autoregressive encoder-decoder Transformer architecture, contrasting with diffusion models, and employs an explicit end-of-generation token for improved reliability. The model was trained on a 20 million sample synthetic dataset and fine-tuned on a smaller dataset of longer samples, demonstrating competitive performance against predecessors like Emuru and diffusion-based methods like DiffusionPen. Eruku was developed and trained on the LUMI supercomputer using AMD Instinct™ MI250X GPUs and ROCm™, with its code compatible with both CUDA and ROCm™ for flexible development.

Key takeaway

For AI Engineers and ML Scientists working on text image generation, Eruku offers a robust autoregressive solution for high-fidelity styled text. You should consider deploying Eruku on AMD Instinct™ MI250X GPUs with ROCm™ for scalable training and inference, especially for tasks requiring precise control over text content and visual style, such as synthetic document creation or handwriting synthesis.

Key insights

Eruku is an autoregressive Transformer for styled text image generation, outperforming diffusion models in text control and style fidelity.

Principles

Autoregressive models excel in variable-length text image generation.
Explicit end-of-generation tokens enhance content adherence.
Synthetic datasets can effectively train robust text generation models.

Method

Eruku tokenizes text with ByT5 and processes style images via Emuru's VAE encoder. An encoder-decoder Transformer then generates image tokens autoregressively, stopping upon an end-of-generation token, which are then decoded by the VAE.

In practice

Use Eruku for synthetic handwriting generation.
Apply Eruku for graphic design text styling.
Integrate Eruku for OCR training data creation.

Topics

Eruku
Styled Text Image Generation
AMD Instinct MI250X
ROCm
LUMI Supercomputer

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.