An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Speech Technology · Depth: Advanced, quick

Summary

An empirical study addresses emotional speech synthesis (ESS) for the VLSP 2022 task, aiming to generate humanlike, natural-sounding voices with desired emotional expressions from input text. The research integrates speaker embedding and a prosody bottleneck into the FastSpeech 2 architecture. This approach enables the system to perform two key sub-tasks: first, generating emotional speech for a single speaker, and second, transferring speaking styles from a source speaker to a target speaker using neutral, non-expressive data, while crucially retaining the target speaker's identity. The work contributes to improving expressiveness control in deep learning-based text-to-speech (TTS) systems, a significant challenge in the field.

Key takeaway

For Machine Learning Engineers developing expressive text-to-speech (TTS) systems, integrating speaker embedding and a prosody bottleneck into FastSpeech 2 offers a robust method. You can apply this architecture to generate emotional speech for individual speakers or to transfer speaking styles between speakers while preserving the target speaker's unique voice identity. Prioritize this approach for enhancing the emotional nuance and naturalness of your synthesized voices, especially when tackling specific expressive speech tasks.

Key insights

Integrating speaker embedding and prosody bottleneck into FastSpeech 2 enables effective emotional speech synthesis and style transfer.

Method

The method integrates speaker embedding and a prosody bottleneck into FastSpeech 2 to generate single-speaker emotional speech and transfer speaking styles from a source to a target speaker, preserving identity.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.