Home AI Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM...

Gemini 3.0, Claude, and Grok Declare GPT-5.1 Superior on Andrej Karpathy’s ‘LLM Council’

0
"I don't care if we burn $50 billion a year, we're building AGI," says Sam Altman

Andrej Karpathy, an AI researcher and founder of Eureka Labs, recently introduced the “LLM-Council”, an innovative experiment where multiple language models (LLMs) assess user queries before ranking each other’s answers anonymously. Remarkably, OpenAI’s GPT-5.1 frequently emerged as the highest-rated model, contradicting earlier benchmarks that favored Google’s Gemini 3.0. Karpathy noted that LLMs often acknowledge superior responses from their peers, leading to insightful model evaluations. The experiment employs a three-step process: user queries are sent to all models, they anonymously rank each other’s responses, and a “chairman model” consolidates these rankings into a coherent answer. While subjective, Karpathy expressed reservations about the rankings reflecting his own assessments, citing that he finds GPT-5.1 verbose compared to the concise Gemini 3.0. Notably, Vasuman M from Varick AI Agents indicated similar outcomes in his experiments, consistently identifying GPT-5.1 as the top performer, even prompting other models to correct themselves when aware of GPT’s output.

Source link

NO COMMENTS

Exit mobile version