Anthropic, with US government collaboration, has developed a tool to inhibit AI models from facilitating nuclear weapon creation. Announced on Thursday, this “classifier” was designed in partnership with the National Nuclear Security Administration over the past year. It aims to block discussions involving dangerous topics, such as building nuclear reactors, thereby enhancing national security. As AI capabilities advance, Anthropic emphasizes the importance of monitoring whether these models can inadvertently share perilous technical knowledge. This initiative stemmed from “red teaming” exercises with the Energy Department and began in 2024. The tool is already implemented in Anthropic’s Claude models, and the company intends to disseminate its methodology through the Frontier Model Forum, encouraging the broader AI industry to adopt similar safety measures. This proactive approach underscores Anthropic’s commitment to responsible AI deployment and national security, ensuring that advanced technologies do not pose a threat to global safety.
Source link

Share
Read more