Live on Twitch, three advanced AI systems—GPT 5.2, Claude Opus 4.5, and Gemini 3 Pro—are attempting to conquer classic Pokémon games, though their performance remains subpar compared to human players. The experiments began in February, with Claude Sonnet 3.7 managing minimal game interaction, while newer iterations like Claude Opus 4.5 show improvement but still face challenges, often getting stuck. Gemini’s models, utilizing more sophisticated harnesses, have shown better results but highlight the importance of AI design in complex tasks. These AI systems, tasked with long-term planning, illustrate the gap between knowledge and execution. The turn-based nature of Pokémon makes it an ideal testbed for understanding AI capabilities. While AI excels in specialized tasks like chess, its struggle with simpler games reveals fundamental limitations. Interestingly, Gemini 3 Pro’s completion of Pokémon Blue and its unique approach to gameplay highlights the potential for AI in broader real-world applications, despite its quirks.
Source link
