OpenAI recently introduced the GDPval benchmark test to evaluate the performance of its GPT-5 model against industry professionals. This test aims to measure how closely AI systems approach human output in economically significant sectors, a crucial component in the pursuit of artificial general intelligence (AGI). OpenAI claims its GPT-5 and Anthropic’s Claude Opus 4.1 “are already approaching the work quality of industry experts.”
Initial results reveal GPT-5 was rated better or equal to experts in 40.6% of cases, while Claude Opus 4.1 excelled in 49% of tasks. Despite these findings, OpenAI acknowledges that GDPval only addresses a fraction of actual job responsibilities and plans to create more comprehensive assessments.
The results suggest AI can enhance productivity, allowing professionals to focus on higher-value tasks. OpenAI’s leaders express optimism about the progress of GDPval and foresee further advancements in AI capabilities that will support human workers more effectively.
Source link