Home AI NVIDIA Unveils GB300 to Cut Costs of AI Agent Inference

NVIDIA Unveils GB300 to Cut Costs of AI Agent Inference

0
NVIDIA touts GB300 to slash AI agent inference costs

NVIDIA’s new GB300 NVL72 platform significantly enhances agentic AI and coding assistant performance, boasting up to 50x higher throughput per megawatt than the previous Hopper platform. This results in up to 35x lower costs per token for low-latency inference, making it ideal for interactive AI applications. Major cloud providers, including Microsoft and Oracle, are deploying these systems to meet growing demands for fast responses and extensive context management in coding tasks. NVIDIA emphasizes that generational improvements translate to better total cost ownership (TCO) and performance. The latest updates in software, including TensorRT-LLM optimizations, further boost low-latency capabilities. Looking ahead, NVIDIA’s upcoming Vera Rubin platform promises even greater efficiencies, potentially achieving 10x higher throughput per megawatt. As inference becomes central to AI production, the GB300 NVL72 is positioned as a game-changer in cost-effective, scalable AI workloads, enhancing token economics and performance for enterprises.

Source link

NO COMMENTS

Exit mobile version