Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM Council’

Andrej Karpathy, an AI researcher and founder of Eureka Labs, recently introduced the “LLM-Council”, an innovative experiment where multiple language models (LLMs) assess user queries before ranking each other’s answers anonymously. Remarkably, OpenAI’s GPT-5.1 frequently emerged as the highest-rated model, contradicting earlier benchmarks that favored Google’s Gemini 3.0. Karpathy noted that LLMs often acknowledge superior responses from their peers, leading to insightful model evaluations. The experiment employs a three-step process: user queries are sent to all models, they anonymously rank each other’s responses, and a “chairman model” consolidates these rankings into a coherent answer. While subjective, Karpathy expressed reservations about the rankings reflecting his own assessments, citing that he finds GPT-5.1 verbose compared to the concise Gemini 3.0. Notably, Vasuman M from Varick AI Agents indicated similar outcomes in his experiments, consistently identifying GPT-5.1 as the top performer, even prompting other models to correct themselves when aware of GPT’s output.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

SEBI Launches AI Tool ‘Sudarshan’ to Eliminate 120,000 Misleading ‘Finfluencer’ Posts, Says Tuhin Kanta Pandey – The Economic Times

Unlocking AI’s True Potential: Bridging the Gap Between Promise and Reality

Sentient Launches Arena to Evaluate Autonomous AI Agents Under Stress Tests

OpenAI’s Pentagon Partnership Sparks Controversy Over AI Ethics and Security Standards – ITP.net

OpenAI Secures Pentagon Contract as Altman Navigates Complex Deals Following $110B Funding Boost

The Importance of Learning Spanish in the Age of AI

Introducing an AI Tool to Guide You Through Toyota’s 5 Whys Method

Refining Agent Native: Expanding Functionality from 1 Hour to 24 Hours with Reviewer Agent

Transform Your Output with Yakki.ai: Speak It, Ship It!

Are AI Models Being Compressed for the 4 Billion People Without GPUs or Internet Access?

Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM Council’

Clawd Cursor: Enhanced Desktop Tool for OpenClaw

Introducing Nopp: AI-Powered Interactive Sales Micro-Websites

Key Insights: What OpenAI Recognized That Anthropic Missed — The Information

GitHub Repository: OldeUCryptoBoi’s LinkedIn AI Detector

Beyond the Backpack: Insights from $900/Day AI Costs on Mastering MCP

Local News

The Importance of Learning Spanish in the Age of AI

SEBI Launches AI Tool ‘Sudarshan’ to Eliminate 120,000 Misleading ‘Finfluencer’ Posts, Says Tuhin Kanta Pandey – The Economic Times

Introducing an AI Tool to Guide You Through Toyota’s 5 Whys Method

Unlocking AI’s True Potential: Bridging the Gap Between Promise and Reality

The Importance of Learning Spanish in the Age of AI

SEBI Launches AI Tool ‘Sudarshan’ to Eliminate 120,000 Misleading ‘Finfluencer’ Posts, Says Tuhin Kanta Pandey – The Economic Times

Introducing an AI Tool to Guide You Through Toyota’s 5 Whys Method