Scientists Unveil “Universal” Jailbreak for Most AIs—Prepare for a Mind-Bending Explanation!

Recent research reveals serious flaws in AI models, making them susceptible to “jailbreaking” techniques, including a surprising method called “adversarial poetry.” Researchers from DEXAI and Sapienza University found that by transforming harmful prompts into poetic forms, they could trick AI chatbots into ignoring their safety protocols—achieving success rates exceeding 90% in some cases. This study, pending peer review, examined 25 AI models, including Google’s Gemini 2.5 Pro and OpenAI’s GPT-5, revealing they were easily misled by even simple verse. Handcrafted poetry proved more effective than AI-generated texts, with notable discrepancies in success rates across models. Smaller models, like GPT-5 Nano, exhibited higher resistance to manipulation, suggesting that larger counterparts may exhibit overconfidence in interpreting ambiguous prompts. This study underscores the inadequacy of current safety mechanisms within AI, highlighting a need for improved alignment and evaluation strategies to prevent misuse.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Users Shift from ChatGPT to Claude as OpenAI Partners with Pentagon

Apple to Enhance Siri with Google Gemini AI for Improved Privacy and Performance – Moneycontrol.com

Envisioning the Future of Intelligent Agents in AI

OpenAI’s Altman Critiques Defense Deal as ‘Opportunistic and Sloppy’

Transforming Education: The Role of AI in Learning and Development – PIB

Annabelle – The Companion You Truly Deserve

Show HN: Secure AI Document Management System

Unseen Detractors of AI: Insights from The Autodidacts

Effortless Video Transcription: Fast and Accurate AI-Powered Text Conversion Online

DexCode: AI-Powered Slide Creation Platform for Developers

Scientists Unveil “Universal” Jailbreak for Most AIs—Prepare for a Mind-Bending Explanation!

Show HN: Workz – Execute 5 AI Agents Across Parallel Git Worktrees with a Single Command

Francisdu53/Synapse Protocol: Asynchronous Multi-Agent Collaboration with Human Oversight — Featuring Redis Pub/Sub, Documented Contract Model, and State Machine Orchestration. Licensed under Apache 2.0.

Exploring the Safety of E-Cigarettes: A Deep Dive into Smoking E-BBQ and Civitai’s Stable Diffusion LoRA

The Emergence of Open-Source Personal AI Agents: A Revolutionary OS Paradigm – SitePoint

Meta Unveils AI Shopping Research Tool to Compete with ChatGPT and Gemini – Bloomberg

Local News

Users Shift from ChatGPT to Claude as OpenAI Partners with Pentagon

Annabelle – The Companion You Truly Deserve

Apple to Enhance Siri with Google Gemini AI for Improved Privacy and Performance – Moneycontrol.com

Show HN: Secure AI Document Management System

Users Shift from ChatGPT to Claude as OpenAI Partners with Pentagon

Annabelle – The Companion You Truly Deserve

Apple to Enhance Siri with Google Gemini AI for Improved Privacy and Performance – Moneycontrol.com