OpenAI has launched GPT-5.3-Codex-Spark, a rapid AI encoding model optimized for Cerebras hardware, producing over 1,000 tokens per second. This groundbreaking model is the first from OpenAI not to use Nvidia infrastructure and features enhanced latency-first serving capabilities ideal for developers needing immediate feedback.
The partnership with Cerebras, announced in January, aims to provide substantial computing power, with Codex-Spark emerging as the first tangible outcome. This model is designed for real-time adjustments, allowing developers to make immediate edits, significantly improving workflow efficiency.
Codex-Spark shows remarkable performance on benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, achieving 77.3% accuracy, surpassing its predecessor, GPT-5.2-Codex. OpenAI has also improved latency across its API, resulting in an 80% reduction in client-server overhead.
Available to ChatGPT Pro users and select design partners, Codex-Spark incorporates low-latency innovations that cater to demanding workflows while maintaining cost-effective GPU support.
Source link
