ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

2026-06-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

ReNikud, a novel method for Hebrew Grapheme-to-Phoneme (G2P) conversion, addresses significant challenges posed by the language's abjad writing system, which leaves vowels largely unwritten and creates ambiguity. Traditional G2P approaches struggle with scarce vocalization data, lack lexical stress specification, and reflect formal rules over spoken pronunciation. ReNikud overcomes these limitations through two key insights: weak audio supervision via a phoneme-based Automatic Speech Recognition (ASR) pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio, yielding phonemic transcriptions reflecting natural spoken norms; and a pseudo-vocalization architecture that predicts IPA phonemes at each character position, enforcing character-level alignment. The method surpasses previous state-of-the-art on existing Hebrew G2P benchmarks and the new MILIM benchmark, with code and models slated for release.

Key takeaway

For NLP engineers developing Hebrew text-to-speech or speech technologies, ReNikud presents a significant advancement in Grapheme-to-Phoneme conversion. Its audio-supervised pseudo-labeling and character-aligned architecture address critical data and ambiguity challenges inherent in Hebrew's abjad system. You should explore integrating its methodology or utilizing the upcoming open-source models to enhance the naturalness and accuracy of your Hebrew speech applications.

Key insights

ReNikud leverages weak audio supervision and a character-aligned pseudo-vocalization architecture for accurate Hebrew Grapheme-to-Phoneme conversion.

Principles

Weak audio supervision can mitigate data scarcity for G2P.
ASR pseudo-labeling captures natural spoken pronunciation norms.
Character-level alignment improves abjad G2P accuracy.

Method

ReNikud employs a phoneme-based ASR pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio to generate phonemic transcriptions, then uses a pseudo-vocalization architecture to predict IPA phonemes at each character position.

In practice

Improve naturalness of Hebrew text-to-speech systems.
Advance development of Hebrew speech technologies.

Topics

Hebrew G2P
Text-to-Speech
Automatic Speech Recognition
Pseudo-labeling
Phoneme Prediction
Abjad Systems

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.