Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Advancing WordArt-Oriented Scene Text Recognition (WATER) addresses the challenge of recognizing highly customized artistic text, which existing Scene Text Recognition (STR) methods struggle with due to their focus on regular text and fixed-template inputs. Researchers constructed WATER-S, a 2M synthetic dataset, improving scale by hundreds of times. This dataset includes data rendered by an upgraded SynthWordArt pipeline and data generated using Qwen3-VL for prompt mining and Z-Image for image synthesis. They also proposed WATERec, a model featuring a visual encoder for arbitrary-shaped inputs and an autoregressive decoder for complex layouts. Combined with WATER-R (reorganized real STR data), this approach achieved 90.40% accuracy on WordArt-Bench, significantly surpassing general-purpose and OCR-specialized vision-language models.

Key takeaway

For machine learning engineers developing robust OCR solutions for highly stylized or artistic text, existing general STR methods are insufficient. You should consider specialized datasets like WATER-S and models such as WATERec, which support arbitrary-shaped inputs and complex layouts. Adopting these advanced techniques can significantly improve recognition accuracy, achieving 90.40% on benchmarks like WordArt-Bench, surpassing general-purpose vision-language models.

Key insights

Recognizing highly customized WordArt demands specialized datasets and models beyond general Scene Text Recognition.

Principles

Method

Construct WATER-S using SynthWordArt and Qwen3-VL/Z-Image for diverse synthetic data. Develop WATERec with an arbitrary-shaped visual encoder and an autoregressive decoder.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.