NVIDIA’s new GB300 NVL72 platform significantly enhances agentic AI and coding assistant performance, boasting up to 50x higher throughput per megawatt than the previous Hopper platform. This results in up to 35x lower costs per token for low-latency inference, making it ideal for interactive AI applications. Major cloud providers, including Microsoft and Oracle, are deploying these systems to meet growing demands for fast responses and extensive context management in coding tasks. NVIDIA emphasizes that generational improvements translate to better total cost ownership (TCO) and performance. The latest updates in software, including TensorRT-LLM optimizations, further boost low-latency capabilities. Looking ahead, NVIDIA’s upcoming Vera Rubin platform promises even greater efficiencies, potentially achieving 10x higher throughput per megawatt. As inference becomes central to AI production, the GB300 NVL72 is positioned as a game-changer in cost-effective, scalable AI workloads, enhancing token economics and performance for enterprises.
Source link
Share
Read more