Apple’s paper, “The Illusion of Thinking,” released ahead of WWDC 2025, challenges assumptions in the AI reasoning space, focusing not on benchmarks but on how models behave in controlled, complexity-increasing environments. The study reveals that while AI models perform solidly on simpler tasks, they experience a sudden collapse in reasoning abilities when faced with greater complexity. This failure is characterized by models ceasing to attempt problem-solving rather than a gradual decline in performance.
Interestingly, even when provided with established algorithms, models like Claude 3.7 Sonnet Thinking and OpenAI’s o1/o3 struggle to execute knowledge reliably. The paper identifies three performance regimes, highlighting that standard models often outperform reasoning models in lower complexities. Notably, even erroneous outputs may appear fluent and convincing, blurring the line between success and failure. Apple emphasizes the importance of understanding these limits in developing reliable AI systems, advocating for structured approaches and clear awareness of model capabilities.
Source link