Saturday, February 21, 2026

Trending on Hugging Face: The 40-Second Open-Source Speech Model

NineNineSix has launched Kani TTS 2, an advanced open-source text-to-speech (TTS) model that enhances audio generation length and stability, focusing on high-quality speech AI for underrepresented languages. This version generates up to 40 seconds of continuous speech, more than doubling the previous limit, and is trending on Hugging Face as a top TTS model.

Kani TTS 2 maintains its lightweight architecture while supporting zero-shot voice cloning, allowing developers to replicate speakers’ tones from brief audio samples. The full pretraining code is available, enabling diverse organizations to train TTS systems for various languages, especially low-resource ones.

With 400 million parameters trained on 10,000 hours of speech data, the model is efficient, needing about 3 GB of GPU memory for deployment. This architectural efficiency positions NineNineSix as a key player in democratizing speech AI, addressing the critical issue of language inclusion in AI technologies.

Source link

Share

Read more

Local News