#1 Trusted Cybersecurity News Platform
Followed by 5.20+ million
The Hacker News Logo
Subscribe – Get Latest News
AWS EKS Security Best Practices

AI Safety | Breaking Cybersecurity News | The Hacker News

Category — AI Safety
New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

Jun 12, 2025 AI Jailbreaking / Prompt Injection
Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails with just a single character change. "The TokenBreak attack targets a text classification model's tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent," Kieran Evans, Kasimir Schulz, and Kenneth Yeung said in a report shared with The Hacker News. Tokenization is a fundamental step that LLMs use to break down raw text into their atomic units – i.e., tokens – which are common sequences of characters found in a set of text. To that end, the text input is converted into their numerical representation and fed to the model.  LLMs work by understanding the statistical relationships between these tokens, and produce the next token in a sequence of tokens. The output tokens are detokeni...
New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

Apr 29, 2025 Vulnerability / Artificial Intelligence
Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content. The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails . "Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content," the CERT Coordination Center (CERT/CC) said in an advisory released last week. The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request.  "The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts," CERT/CC added. Success...
Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Oct 23, 2024 Artificial Intelligence / Vulnerability
Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average attack success rate (ASR) of 64.6% within three interaction turns. "Deceptive Delight is a multi-turn technique that engages large language models (LLM) in an interactive conversation, gradually bypassing their safety guardrails and eliciting them to generate unsafe or harmful content," Unit 42's Jay Chen and Royce Lu said. It's also a little different from multi-turn jailbreak (aka many-shot jailbreak) methods like Crescendo , wherein unsafe or restricted topics are sandwiched between innocuous instructions, as opposed to gradually leading the model to produce harmful outpu...
cyber security

Master SaaS AI Risk: Your Complete Governance Playbook

websiteReco AIArtificial Intelligence / SaaS Security
95% use AI, but is it secure? Master SaaS AI governance with standards-aligned frameworks.
Watch This Webinar to Uncover Hidden Flaws in Login, AI, and Digital Trust — and Fix Them

Designing Identity for Trust at Scale—With Privacy, AI, and Seamless Logins in Mind

Jul 24, 2025
Is Managing Customer Logins and Data Giving You Headaches? You're Not Alone! Today, we all expect super-fast, secure, and personalized online experiences. But let's be honest, we're also more careful about how our data is used. If something feels off, trust can vanish in an instant. Add to that the lightning-fast changes AI is bringing to everything from how we log in to spotting online fraud, and it's a whole new ball game! If you're dealing with logins, data privacy, bringing new users on board, or building digital trust, this webinar is for you . Join us for " Navigating Customer Identity in the AI Era ," where we'll dive into the Auth0 2025 Customer Identity Trends Report . We'll show you what's working, what's not, and how to tweak your strategy for the year ahead. In just one session, you'll get practical answers to real-world challenges like: How AI is changing what users expect – and where they're starting to push ba...
Expert Insights Articles Videos
Cybersecurity Resources
//]]>