Joel Zhang of the ARISE Foundation conducted a benchmark comparing Gemini 3 Pro and Gemini 2.5 Pro in the Pokémon Crystal game. The results showcased Gemini 3 Pro’s superiority, as it obtained 16 badges and defeated elite opponents, including hidden boss Red, while Gemini 2.5 Pro only garnered four badges. Significant performance gains were observed, with Gemini 3 Pro completing the game in 17 days using 1.88 billion tokens, while Gemini 2.5 Pro was estimated to take 69 days and over 15 billion tokens. The advanced capabilities of Gemini 3 Pro included creating the custom tool “press_sequence” for efficient play and leveraging multi-modal data processing to enhance decision-making, such as identifying discrepancies in game puzzles through visual cues. However, it had limitations in proactive planning and sometimes formed unverified hypotheses. Overall, Gemini 3 Pro illustrated substantial advancements in agentic reasoning and problem-solving efficiency compared to Gemini 2.5 Pro, marking a significant leap in artificial intelligence capabilities.
Source link
Share
Read more