Google’s recent evaluation of AI chatbots reveals concerning results. Utilizing the FACTS Benchmark Suite, the tech giant found that no AI model, including its own Gemini 3 Pro, surpasses a factual accuracy rate of 70%. Gemini 3 Pro leads with just 69%, highlighting a significant issue: chatbots can deliver incorrect answers one-third of the time, despite their confident tone. The FACTS Benchmark, developed in collaboration with Kaggle, focuses on the accuracy of information rather than mere task completion, an essential distinction for sectors like healthcare and finance. Performance varied widely, with Gemini 2.5 Pro and OpenAI’s ChatGPT-5 scoring around 62% and Claude 4.5 Opus at 54%. Notably, multimodal tasks showed even weaker results. While Google acknowledges the progress of AI chatbots, it emphasizes the necessity of human oversight and skepticism to avoid potential pitfalls, underscoring the importance of reliable AI technology for critical applications.
Source link
