Home AI Study Reveals Poetry Can Manipulate ChatGPT and Gemini into Providing Harmful Responses

Study Reveals Poetry Can Manipulate ChatGPT and Gemini into Providing Harmful Responses

0
AI can tricked into answering harmful requests using poetry

The rise of AI chatbots has led to increased concerns about potential misuse. Companies are implementing safety measures to prevent harmful responses, yet a recent study by Italy’s Icaro Lab reveals vulnerabilities in these AI models. Researchers discovered that phrasing harmful requests as poetry serves as a “universal single-turn jailbreak,” successfully eliciting dangerous replies. Their testing showed a 62% success rate using 20 carefully crafted poetic prompts across 25 advanced models, including Google and OpenAI. Even auto-rewritten harmful poetry achieved a 43% success rate. This phenomenon indicates that poetic prompts consistently trigger unsafe responses, often far exceeding normal prose in effectiveness. While smaller models, such as GPT 5 Nano, demonstrated better resistance, larger models struggled due to their deeper engagement with complex linguistic structures. This research challenges the assumption that closed-source models offer superior safety, highlighting the structural weaknesses across all AI systems and illustrating how poetry can circumvent conventional safety protocols.

Source link

NO COMMENTS

Exit mobile version