Alibaba’s Qwen3 Upgrade Outperforms OpenAI and DeepSeek in Benchmark Tests

Alibaba has launched an impressive update to its open-source Qwen3 family, introducing the Qwen3‑235B‑A22B‑Instruct‑2507‑FP8 model. This upgrade enhances capabilities in instruction understanding, logical reasoning, text analysis, mathematics, science, coding, and tool integration, making it a leader in key benchmarks. Notably, the model scored 70.3 on the American Invitational Mathematics Exam, surpassing competitors like DeepSeek-V3 and OpenAI’s GPT‑4o. In coding assessments, it achieved 87.9 in MultiPL‑E, outperforming both DeepSeek and OpenAI, while Anthropic’s Claude Opus 4 slightly led with 88.5. A significant advancement is the context capacity increase to 256k tokens, enabling it to manage longer documents in non-thinking mode. This open-source release on platforms like HuggingFace and ModelScope emphasizes Alibaba’s dedication to a transparent, high-performance AI ecosystem, intensifying competition within China’s AI market against Western counterparts and local startups like DeepSeek.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Walmart Media Partner Embraces OpenAI as Retail Giant Moves Away – The Arkansas Democrat-Gazette

Exploring OpenAI’s Vulnerabilities: Dependence on Microsoft, Legal Challenges from Elon Musk, and xAI

David Dugan, Former Meta Ad Expert, Appointed to Lead OpenAI’s Advertising Team – 03/24/2026

Unwilling to Say Goodbye to ChatGPT? I Tested a Tool That Deliberately Slows It Down — Insights on the Growing AI Backlash

“Google Unveils Gemini AI-Driven Features for Enhanced Marketing Solutions” – Social Media Today

Revolutionizing FrontierMath: The Inaugural AI Solution for Open Problems

Introducing Cronbox: Your AI Agent Scheduling Solution

Bezos Advocates for AI to Streamline Building Permits in Just 10 Seconds: Insights from the Data.

Show HN: Simplifying AI Management for Cloudflare WAF, Stripe, and Supabase

Product Teams vs. Feature Teams: Navigating the AI Landscape

Alibaba’s Qwen3 Upgrade Outperforms OpenAI and DeepSeek in Benchmark Tests

Chartis Enhances Client Solutions by Bringing AI Development In-House – Crain’s Chicago Business

An AI Coach for Mastering Difficult Conversations and Social Skills

Revolutionizing 3nm Semiconductors: The Impact of AI-Powered EDA Tools

Discover HN: Illux – AI-Driven Illustrations Tailored to Your Brand Identity

Mark Zuckerberg Develops AI CEO Agent, Raising Concerns About the Future of Leadership

Local News

Walmart Media Partner Embraces OpenAI as Retail Giant Moves Away – The Arkansas Democrat-Gazette

Revolutionizing FrontierMath: The Inaugural AI Solution for Open Problems

Exploring OpenAI’s Vulnerabilities: Dependence on Microsoft, Legal Challenges from Elon Musk, and xAI

Introducing Cronbox: Your AI Agent Scheduling Solution

Walmart Media Partner Embraces OpenAI as Retail Giant Moves Away – The Arkansas Democrat-Gazette

Revolutionizing FrontierMath: The Inaugural AI Solution for Open Problems

Exploring OpenAI’s Vulnerabilities: Dependence on Microsoft, Legal Challenges from Elon Musk, and xAI