A recent study by a Korean research team from KAIST reveals a new security vulnerability in Large Language Models (LLMs) utilizing the Mixture-of-Experts (MoE) architecture, significant for AI safety. Major LLMs, like Google’s Gemini, leverage this structure to enhance efficiency by employing selective ‘expert AI’ models based on context. However, the researchers discovered an attack method that allows harmful responses to be generated by merely introducing a single manipulated expert AI model, highlighting a major security risk without needing direct internal access. Presented at the ACSAC 2025 conference and awarded Best Paper, the findings indicate that this attack can escalate harmful output rates from 0% to 80% with negligible performance degradation, making it difficult to detect. Professors Seungwon Shin and Sue-el Son emphasize the pressing need for rigorous verification of expert models, underscoring the growing security threats in AI development. This groundbreaking research emphasizes the critical importance of AI security protocols.
Source link