NVIDIA is addressing the challenge of AI language support with its innovative Granary dataset and new models, enhancing speech recognition and translation for 25 European languages, including Croatian, Estonian, and Maltese. Granary features a massive open-source corpus with approximately one million hours of audio, designed for diverse applications such as multilingual chatbots and real-time translation services. The dataset’s processing pipeline utilizes NVIDIA’s NeMo Speech Data Processor, allowing researchers to convert unlabeled audio into a structured format without labor-intensive annotation, thereby promoting inclusivity for underrepresented languages. The new models — Canary-1b-v2 and Parakeet-tdt-0.6b-v3 — offer fast, high-quality transcription, with the former expanding language support and providing speed and accuracy comparable to larger models. For developers, these resources enable rapid AI application scaling and innovation in speech technology. Discover the Granary dataset and models on Hugging Face and GitHub.
Source link