Wednesday, December 3, 2025

PolyNorm: Enhancing Text-to-Speech with Few-Shot LLM Text Normalization

Text Normalization (TN) is essential in Text-to-Speech (TTS) systems, transforming written text into its spoken forms. Traditional TN methods are often labor-intensive, challenging to scale, and limited in language coverage, especially in low-resource environments. To address these issues, we introduce PolyNorm, a prompt-based TN solution utilizing Large Language Models (LLMs). This innovative approach minimizes the dependence on manually created rules, enhancing linguistic versatility with reduced human oversight. Furthermore, we offer a language-agnostic pipeline for automatic data curation and evaluation, allowing for scalable experimentation across multiple languages. Our experiments across eight languages demonstrate significant reductions in word error rate (WER) compared to conventional production-grade systems. This advancement not only streamlines TN processes but also opens avenues for further research in the field, making it easier to implement robust TTS solutions in diverse linguistic contexts.

Source link

Share

Read more

Local News