OpenAI and Anthropic are collaborating with U.S. and U.K. governments to enhance the safety of their large language models (LLMs) against misuse. This partnership, documented in recent blogs, includes granting access to their models for independent evaluations by researchers at the National Institute of Standards and Technology (NIST) and the U.K. AI Security Institute. The aim is to identify vulnerabilities, including potential attack vectors that could compromise security. OpenAI discovered two significant vulnerabilities that could allow sophisticated hacks but has since worked on reinforcing safeguards in products like GPT-5 and ChatGPT. Similarly, Anthropic has shared its Claude AI for testing and discovered critical vulnerabilities, prompting a complete restructured safeguard architecture. Despite concerns over prioritizing competitiveness over safety, experts affirm that commercial models are becoming more secure. AI safety remains a debated topic, though ongoing collaboration signifies continued commitment to addressing vulnerabilities in these systems.
Source link