Saturday, January 17, 2026

ORCA Reveals AI Struggles with Math Accuracy – The Register

AI Math Miscalculations: What the Latest Study Reveals

In the realm of artificial intelligence, even the most advanced language models (LLMs) struggle with basic math. A recent benchmark study called ORCA (Omni Research on Calculation in AI) evaluated five leading models: ChatGPT-5, Gemini 2.5 Flash, Claude Sonnet 4.5, Grok 4, and DeepSeek V3.2. The results? All scored below 63% accuracy.

Key Findings:

  • Accuracy Scores:

    • Gemini 2.5 Flash: 63%
    • Grok 4: 62.8%
    • DeepSeek V3.2: 52%
    • ChatGPT-5: 49.4%
    • Claude Sonnet 4.5: 45.2%
  • Common Errors:

    • 35% involved rounding inaccuracies
    • 33% were calculation mistakes

Researchers emphasize the need for benchmarks that test true computational reasoning, revealing past evaluations may not reflect real-world capabilities.

🚀 Engage with us! Share your thoughts on AI’s math reliability and future improvements.

Source link

Share

Read more

Local News