Monday, December 1, 2025

Gemini 3: Mastering Visual Reasoning and Vending Machine Operations

Measuring AI model capabilities involves more than basic tests like spelling or performing arithmetic. Researchers utilize complex benchmark tests created by various organizations, with one notable test being Vending-Bench 2 by Andon Labs. This innovative assessment simulates a vending machine business where AI models manage operations over a year, starting with a $500 balance. Success is measured by the amount of cash remaining at year-end. The simulated environment presents challenges like supplier negotiations, weather impacts, and price management. Google’s Gemini 3 Pro excelled in this test, ending with $5,478, outperforming competitors like Claude’s Sonnet 4.5 and GPT-5.1, which struggled with trust issues in supplier relationships. While Gemini’s performance is promising, it still falls short of human potential, whose effective strategies could yield around $63,000 annually. Additionally, Gemini’s superior scores in traditional AI benchmarks position it as a key player in Google’s AI ecosystem, driving growth in various applications.

Source link

Share

Read more

Local News