Testing the Best AI Chatbots for Simple Math: Gemini, ChatGPT, and Grok in Action

Artificial Intelligence (AI) is increasingly utilized for everyday math tasks, but a recent study reveals concerning accuracy levels. The Omni Research on Calculation in AI (ORCA) indicates that AI chatbots have a 40% chance of delivering incorrect answers on basic calculations. Five leading models were tested on 500 prompts, with Gemini 2.5 from Google topping the leaderboard at 63% accuracy, closely followed by Grok-4 (62.8%). ChatGPT-5, Claude 4.5, and DeepSeek V3.2 lag behind with scores of 49.4%, 45.2%, and 52%, respectively.

Performance varies: Gemini leads in math and conversions at 83%, while physics scores a dismal 35.8%. DeepSeek scored merely 10.6% in biology and chemistry. Experts advise users to double-check AI outputs using calculators or trusted sources, particularly for critical calculations, as AI models struggle with computation and precision issues. Understanding these limitations is crucial for trusting AI tools in mathematical tasks.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Author Discussion at Powell River Public Library Highlights AI Tools – Powell River Peak

Anthropic’s Groundbreaking AI Tool Shakes Up IBM: Key Developments and Their Significance – Moneycontrol

Aspiring Chef? Discover the Essential AI Tools You Need!

OpenAI and Anthropic Alleged Chinese Competitors of Large-Scale AI Data Theft

Ant Group Reports Over 100 Million Users for AI Payments and Health Apps This Spring Festival – Pandaily

Effortlessly Organize Your Expenses in Seconds

EPIC: Your Ultimate Free Online Design Tool

Introducing the AI Profit Calculator for Resellers: Now Live on the App Store at an Unbeatable Price!

Chat with Obscurify.ai: Your Personal AI Companion

Build Your Own Always-On Personal AI Agent: Memory, Discord Integration, Scheduled Tasks, and Tool Access

Testing the Best AI Chatbots for Simple Math: Gemini, ChatGPT, and Grok in Action

DARE ENGINE v2: The Next Generation AI-Driven PDF Language

Minister Confirms OpenAI Representatives Called to Ottawa

Watchdogs Urge AI Image Tools to Adhere to Privacy Regulations

Jansen Teng: How AI Agents Will Emerge as Autonomous Economic Entities, Teleoperation Can Slash Costs by 60%, and Tokenization Will Drive Robotics Innovation

Safeclaw Unveils Safe GEN AI: A New Era of Secure Blogging with Non-Generative AI

Local News

Author Discussion at Powell River Public Library Highlights AI Tools – Powell River Peak

Effortlessly Organize Your Expenses in Seconds

Anthropic’s Groundbreaking AI Tool Shakes Up IBM: Key Developments and Their Significance – Moneycontrol

EPIC: Your Ultimate Free Online Design Tool

Author Discussion at Powell River Public Library Highlights AI Tools – Powell River Peak

Effortlessly Organize Your Expenses in Seconds

Anthropic’s Groundbreaking AI Tool Shakes Up IBM: Key Developments and Their Significance – Moneycontrol