New ‘Echo Chamber’ Attack Exploits GPT and Gemini, Posing Safety Risks

Researchers conducted a study on the Echo Chamber attack against two leading language models (LLMs), testing 200 jailbreak attempts across eight sensitive categories adapted from the Microsoft Crescendo benchmark. The categories included profanity, sexism, violence, hate speech, misinformation, illegal activities, self-harm, and pornography. The results revealed that for sexism, violence, hate speech, and pornography, the attack successfully bypassed safety filters over 90% of the time. Misinformation and self-harm attempts achieved an 80% success rate, while profanity and illegal activities saw a lower 40% bypass rate due to stricter enforcement. Effective steering prompts often involved storytelling or hypothetical scenarios, with most successful attacks occurring within 1-3 manipulative turns. The study recommends that LLM vendors implement dynamic, context-aware safety checks, such as toxicity scoring for multi-turn conversations and training models to better detect indirect prompt manipulations.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

“Admiring Talent: Bassist Mohini Dey and Others Face Backlash for Supporting Generative AI Music Tools” – Ultimate Guitar

Copyright Challenges Intensify as OpenAI Faces Off Against Newspapers and Piracy

Revolutionizing Content Creation: The Impact of Video Watermark Removers and AI Room Design Apps – StreetInsider

Breakthrough Study Showcases Hologic’s Innovative AI Tools in Mammography

Class Action Lawsuit Takes Aim at Google’s Gemini Features

Showcasing Runway Gen 4.5: Inspiring Examples of AI Video Generation

Why a Dull AI Coworker is the Key to Success: Embracing RPA’s Wisdom

AI Model Analyzes Prison Phone Calls to Detect Potential Crimes

Schoblaska/Jargon: A Personal Research Library for Article Analysis, Insight Extraction, and Cross-Domain Connections.

Navigating the Future: AI Insights on Subtle Corporate Influence in Independent Content

New ‘Echo Chamber’ Attack Exploits GPT and Gemini, Posing Safety Risks

Three Years Later: ChatGPT Falls Short of Expectations and May Never Meet Them

Unveiling AI Security: A Vision for 2025

Transformer Creator Reveals the Inside Story Behind GPT

Colleges Might Misstep on AI, Harming Gen Z Job Seekers in the Process

AI Tools Propel Massive US Traffic Surge on Black Friday

Local News

Showcasing Runway Gen 4.5: Inspiring Examples of AI Video Generation

“Admiring Talent: Bassist Mohini Dey and Others Face Backlash for Supporting Generative AI Music Tools” – Ultimate Guitar

Why a Dull AI Coworker is the Key to Success: Embracing RPA’s Wisdom

Copyright Challenges Intensify as OpenAI Faces Off Against Newspapers and Piracy

Showcasing Runway Gen 4.5: Inspiring Examples of AI Video Generation

“Admiring Talent: Bassist Mohini Dey and Others Face Backlash for Supporting Generative AI Music Tools” – Ultimate Guitar

Why a Dull AI Coworker is the Key to Success: Embracing RPA’s Wisdom