OpenAI and Cerebras have forged a multi-year partnership aiming to deploy 750 MW of Cerebras wafer-scale AI solutions, beginning in 2026 and continuing through 2028. This collaboration is set to revolutionize AI model outputs, with Cerebras’ unique systems dramatically enhancing response times. By combining vast compute, memory, and bandwidth on a single chip, these systems achieve inference speeds that are up to 15 times faster than traditional GPU-based setups. OpenAI’s Head of Compute Infrastructure, Sachin Katti, highlights the importance of integrating dedicated low-latency solutions for real-time AI interactions. Cerebras’ CEO, Andrew Feldman, notes this partnership as a transformative leap similar to the broadband revolution for the internet. Notably, cutting-edge models like GPT-OSS-120B and Llama 3.2-70B show exceptional performance, with speeds of 3,000 and 2,100 tokens per second, respectively, showcasing the potential for unprecedented advancements in AI applications and interactions.
Source link
