TEN has announced the open-source release of the ONNX model and preprocessing code for its Voice Activity Detection (VAD) system, enabling deployment across various platforms. The TEN suite consists of several open-source tools for developing real-time multimodal conversational voice agents, including TEN VAD, known for its superior accuracy, low computational complexity, and reduced latency compared to existing solutions like WebRTC VAD and Silero VAD. TEN VAD excels in detecting speech activity with rapid transitions and efficient performance across diverse hardware setups. It supports multiple operating systems, including Linux, Windows, and macOS, and offers configurations for different audio sampling rates. Users can easily integrate TEN VAD into their applications via dynamic libraries, with support for Python and JavaScript interfaces. The repository includes detailed documentation for installation and usage, along with benchmark results demonstrating its effectiveness. The project is released under the Apache 2.0 license.
Source link
Introducing TEN’s Low-Latency, High-Performance Voice Activity Detector (VAD) Framework

Share
Read more