Tuesday, December 23, 2025

OpenAI Acknowledges Limitations of Prompt Injection Solutions, Raising Concerns About the Future of Agentic AI

OpenAI is addressing the persistent issue of prompt injection attacks targeting language models, particularly within browsers, through a significant security update for ChatGPT Atlas. Despite acknowledging that these text-based attacks may never be fully eradicated, OpenAI remains optimistic about minimizing risks over time. The update features a new adversarially trained model and enhanced security measures, prompted by newly discovered attack types via automated red-teaming.

The agent mode allows ChatGPT Atlas to perform human-like actions, making it susceptible to manipulative instructions hidden in emails or web content. For example, an attacker could embed malicious commands in an email that may lead the AI to send a resignation letter instead of an out-of-office message. OpenAI’s commitment involves continuously training AI agents against automated attackers to refine defenses.

While the company draws parallels between prompt injections and social engineering scams, it recognizes the technical vulnerabilities inherent in language models, underscoring the need for ongoing security enhancements to ensure user trust.

Source link

Share

Read more

Local News