Home AI Gemini 3: Mastering Visual Reasoning and Vending Machine Operations

Gemini 3: Mastering Visual Reasoning and Vending Machine Operations

0
Tech

Measuring AI model capabilities involves more than basic tests like spelling or performing arithmetic. Researchers utilize complex benchmark tests created by various organizations, with one notable test being Vending-Bench 2 by Andon Labs. This innovative assessment simulates a vending machine business where AI models manage operations over a year, starting with a $500 balance. Success is measured by the amount of cash remaining at year-end. The simulated environment presents challenges like supplier negotiations, weather impacts, and price management. Google’s Gemini 3 Pro excelled in this test, ending with $5,478, outperforming competitors like Claude’s Sonnet 4.5 and GPT-5.1, which struggled with trust issues in supplier relationships. While Gemini’s performance is promising, it still falls short of human potential, whose effective strategies could yield around $63,000 annually. Additionally, Gemini’s superior scores in traditional AI benchmarks position it as a key player in Google’s AI ecosystem, driving growth in various applications.

Source link

NO COMMENTS

Exit mobile version