A recent study reveals that even advanced AI agents struggle with online freelance tasks, debunking claims of imminent job replacement in office settings. The Remote Labor Index, developed by Scale AI and the Center for AI Safety (CAIS), assessed leading AI models’ ability to perform freelance work. Findings show these AI agents achieved less than 3% of tasks, generating only $1,810 out of a potential $143,991. The top-performing AI was Manus, followed by Grok, Claude, ChatGPT, and Gemini. CAIS director Dan Hendrycks emphasized that while AI has improved in coding and logical reasoning, it still lacks long-term memory and the ability to learn on the job—key human skills. This contrasts sharply with OpenAI’s GDPval benchmark, which suggests AI models approach human competence in various office tasks. The results highlight the limitations of AI in complex work environments and caution against exaggerated predictions of widespread job displacement.
Source link
