CN-NewsTTS Bench: a target-level automatic benchmark for raw-input Chinese news TTS pronunciation
Summary
CN-NewsTTS Bench v0.1 is an open target-level benchmark designed to evaluate the pronunciation accuracy of Chinese news text-to-speech (TTS) systems when processing raw input containing complex written forms. These forms include scores, hyphenated model names, ranges, unit symbols, percentages, English abbreviations, and mixed Chinese-Latin-digit names, which are common in real-world listening scenarios. The benchmark aims to assess systems without reliance on user-side rules, LLM rewriting, SSML hints, or manual edits. The release comprises a 200-record development set, an 800-record public test set, 992 public auto-evaluable targets, fixed transcripts from a three-ASR ensemble, and an automatic target scorer. Initial results for seven product TTS systems show the best system achieving 0.879 strict accuracy, while several others perform below 0.60. The benchmark also provides ASR-route diagnostics, ASR-subset ablations, category-level results, confidence intervals, and provider configuration metadata.
Key takeaway
For NLP Engineers developing or deploying Chinese news TTS systems, you should prioritize robust handling of raw text containing complex written forms. The CN-NewsTTS Bench v0.1 highlights significant performance gaps, with many systems falling below 0.60 strict accuracy on common elements like scores and abbreviations. Evaluate your models against this benchmark to identify weaknesses and ensure your TTS output accurately reflects the intended spoken meaning from unedited input, rather than relying on pre-processing or SSML.
Key insights
CN-NewsTTS Bench evaluates raw-input Chinese news TTS systems on complex written forms without external aids.
Principles
- Complex written forms challenge TTS systems significantly.
- Raw text input evaluation reveals true system robustness.
- ASR ensembles enhance transcript accuracy for TTS benchmarks.
Method
The benchmark uses a 200-record dev set and 800-record public test set with 992 auto-evaluable targets. It employs a three-ASR ensemble for fixed transcripts and an automatic target scorer.
In practice
- Use CN-NewsTTS Bench v0.1 to evaluate Chinese news TTS.
- Focus TTS development on complex written forms.
- Compare system performance against 0.879 strict accuracy.
Topics
- Chinese TTS
- News Text-to-Speech
- TTS Benchmarking
- Pronunciation Evaluation
- Raw Text Processing
- ASR Ensemble
Code references
Best for: AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.