Gemini 3: Mastering Visual Reasoning and Vending Machine Operations

November 20, 2025

Measuring AI model capabilities involves more than basic tests like spelling or performing arithmetic. Researchers utilize complex benchmark tests created by various organizations, with one notable test being Vending-Bench 2 by Andon Labs. This innovative assessment simulates a vending machine business where AI models manage operations over a year, starting with a $500 balance. Success is measured by the amount of cash remaining at year-end. The simulated environment presents challenges like supplier negotiations, weather impacts, and price management. Google’s Gemini 3 Pro excelled in this test, ending with $5,478, outperforming competitors like Claude’s Sonnet 4.5 and GPT-5.1, which struggled with trust issues in supplier relationships. While Gemini’s performance is promising, it still falls short of human potential, whose effective strategies could yield around $63,000 annually. Additionally, Gemini’s superior scores in traditional AI benchmarks position it as a key player in Google’s AI ecosystem, driving growth in various applications.

Source link

{{post_title}}

Gemini 3: Mastering Visual Reasoning and Vending Machine Operations

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative...

IDC MarketScape: Vendor Assessment of Global AI-Driven Enterprise Asset Management Solutions...

NO COMMENTS

LEAVE A REPLY Cancel reply