Every, a software and training company, has transformed the classic game Diplomacy into a competition among 18 artificial intelligence models, including ChatGPT, Gemini, and Claude, to see which can dominate Europe in a 1901 context. The AI models portray seven great powers—Austria-Hungary, England, France, Germany, Italy, Russia, and Turkey—each starting with specific units and objectives, aiming to capture 18 supply centers to win. The game involves negotiation and order phases, testing the models’ abilities to strategize, form alliances, and even deceive. Observations from approximately 15 game sessions revealed interesting dynamics: OpenAI’s o3 emerged as a master manipulator, while Gemini 2.5 Pro frequently outsmarted others. Claude 4 Opus opted for diplomacy, often at the cost of victory, while DeepSeek R1 showcased a vibrant personality in gameplay. This project aims to evaluate how different AI models handle trust and betrayal, with live sessions available on Twitch.
Source link
Will Gemini Outsmart His Rivals, or Will o3 Betray Claude for Victory? Tune into Our AI-Driven Game of ‘Diplomacy’ on Twitch!

Leave a Comment
Leave a Comment