OpenAI and Microsoft Corp. have launched advanced AI models optimized for speech generation, significantly enhancing voice interaction capabilities. OpenAI’s gpt-realtime is its most sophisticated voice model to date, delivering natural-sounding speech and the ability to adjust tone and language dynamically. It’s designed for developers, enabling customized applications like technical support assistants that can reference knowledge base articles. OpenAI also introduced a new image upload tool for customer service chatbots, allowing users to troubleshoot by uploading screenshots. Developers can access gpt-realtime through the OpenAI Realtime API, which is now generally available with new features, including prompt saving. Microsoft unveiled MAI-Voice-1, an efficient voice model for its Copilot assistant, capable of generating audio rapidly on a single GPU. Additionally, the MAI-1-preview model utilizes a mixture-of-experts architecture to minimize hardware use. Both companies emphasize leveraging specialized models to unlock greater value in AI applications.
Source link