Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM Council’

November 24, 2025

Andrej Karpathy, an AI researcher and founder of Eureka Labs, recently introduced the “LLM-Council”, an innovative experiment where multiple language models (LLMs) assess user queries before ranking each other’s answers anonymously. Remarkably, OpenAI’s GPT-5.1 frequently emerged as the highest-rated model, contradicting earlier benchmarks that favored Google’s Gemini 3.0. Karpathy noted that LLMs often acknowledge superior responses from their peers, leading to insightful model evaluations. The experiment employs a three-step process: user queries are sent to all models, they anonymously rank each other’s responses, and a “chairman model” consolidates these rankings into a coherent answer. While subjective, Karpathy expressed reservations about the rankings reflecting his own assessments, citing that he finds GPT-5.1 verbose compared to the concise Gemini 3.0. Notably, Vasuman M from Varick AI Agents indicated similar outcomes in his experiments, consistently identifying GPT-5.1 as the top performer, even prompting other models to correct themselves when aware of GPT’s output.

Source link

{{post_title}}

Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM Council’

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative...

IDC MarketScape: Vendor Assessment of Global AI-Driven Enterprise Asset Management Solutions...

NO COMMENTS

LEAVE A REPLY Cancel reply