New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems
Apr 29, 2025
Vulnerability / Artificial Intelligence
Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content. The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails . "Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content," the CERT Coordination Center (CERT/CC) said in an advisory released last week. The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request. "The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts," CERT/CC added. Success...