OpenAI has enhanced its Realtime API with three new model snapshots aimed at boosting transcription, speech synthesis, and function calling accuracy. The gpt-4o-mini-transcribe model has made significant strides by reducing hallucinations by 89% compared to whisper-1. For text-to-speech applications, the gpt-4o-mini-tts variant reduces the word error rate by 35%, enhancing clarity and accuracy. Moreover, the gpt-realtime-mini model shows a 22% improvement in instruction adherence and a 13% boost in function calling capabilities, making it ideal for voice assistant applications. Additionally, OpenAI has enhanced support for languages including Chinese, Japanese, Indonesian, Hindi, Bengali, and Italian, making its tools more accessible globally. These updates underscore OpenAI’s commitment to providing reliable and efficient audio processing solutions, ensuring a superior user experience across various applications.
Source link
Share
Read more