ORCA Reveals AI Struggles with Math Accuracy &#8211; The Register

AI Math Miscalculations: What the Latest Study Reveals

In the realm of artificial intelligence, even the most advanced language models (LLMs) struggle with basic math. A recent benchmark study called ORCA (Omni Research on Calculation in AI) evaluated five leading models: ChatGPT-5, Gemini 2.5 Flash, Claude Sonnet 4.5, Grok 4, and DeepSeek V3.2. The results? All scored below 63% accuracy.

Key Findings:

Accuracy Scores:
- Gemini 2.5 Flash: 63%
- Grok 4: 62.8%
- DeepSeek V3.2: 52%
- ChatGPT-5: 49.4%
- Claude Sonnet 4.5: 45.2%
Common Errors:
- 35% involved rounding inaccuracies
- 33% were calculation mistakes

Researchers emphasize the need for benchmarks that test true computational reasoning, revealing past evaluations may not reflect real-world capabilities.

🚀 Engage with us! Share your thoughts on AI’s math reliability and future improvements.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com

IDC MarketScape: Vendor Assessment of Global AI-Driven Enterprise Asset Management Solutions for Asset-Intensive Industries (2025-2026)

Cathay FHC Integrates OpenAI into Group Operations – Embracing Data Science Innovation

SoftBank Issues New Bonds to Refinance Debt and Support OpenAI – Finimize

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Sal Khan’s Vision: Rethinking the Impact of AI on Education

Harnessing AI in Intelligent Organizations: Exploring Jevons Paradox and Its Impact on the Workforce

Exploiting MCP Servers in AI Systems: The Risk of Tool Modifications Post-Approval

The AI Quandary: Navigating Challenges and Controversies

ORCA Reveals AI Struggles with Math Accuracy – The Register

Local News

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com

Sal Khan’s Vision: Rethinking the Impact of AI on Education

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com