AI Hacker News

ORCA Reveals AI Struggles with Math Accuracy – The Register

November 18, 2025

AI Math Miscalculations: What the Latest Study Reveals

In the realm of artificial intelligence, even the most advanced language models (LLMs) struggle with basic math. A recent benchmark study called ORCA (Omni Research on Calculation in AI) evaluated five leading models: ChatGPT-5, Gemini 2.5 Flash, Claude Sonnet 4.5, Grok 4, and DeepSeek V3.2. The results? All scored below 63% accuracy.

Key Findings:

Accuracy Scores:
- Gemini 2.5 Flash: 63%
- Grok 4: 62.8%
- DeepSeek V3.2: 52%
- ChatGPT-5: 49.4%
- Claude Sonnet 4.5: 45.2%
Common Errors:
- 35% involved rounding inaccuracies
- 33% were calculation mistakes

Researchers emphasize the need for benchmarks that test true computational reasoning, revealing past evaluations may not reflect real-world capabilities.

🚀 Engage with us! Share your thoughts on AI’s math reliability and future improvements.

Source link

{{post_title}}

ORCA Reveals AI Struggles with Math Accuracy – The Register

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

Rriftt: A Dependency-Free Single-Header AI Library Written in Pure C

Iran Conflict Signals the Dawn of Swift AI-Driven Bombing: Faster Than...

Ensuring Portfolio Integrity: Leveraging Deterministic and AI-Agent Brokers to Prevent Data...

NO COMMENTS

LEAVE A REPLY Cancel reply