#1 Trusted Cybersecurity News Platform
Followed by 5.20+ million
The Hacker News Logo
Subscribe – Get Latest News
Salesforce Security Handbook

AI Safety | Breaking Cybersecurity News | The Hacker News

Category — AI Safety
ThreatsDay Bulletin: CarPlay Exploit, BYOVD Tactics, SQL C2 Attacks, iCloud Backdoor Demand & More

ThreatsDay Bulletin: CarPlay Exploit, BYOVD Tactics, SQL C2 Attacks, iCloud Backdoor Demand & More

Oct 02, 2025 Threat Intelligence / Cyber Attacks
From unpatched cars to hijacked clouds, this week's Threatsday headlines remind us of one thing — no corner of technology is safe. Attackers are scanning firewalls for critical flaws, bending vulnerable SQL servers into powerful command centers, and even finding ways to poison Chrome's settings to sneak in malicious extensions. On the defense side, AI is stepping up to block ransomware in real time, but privacy fights over data access and surveillance are heating up just as fast. It's a week that shows how wide the battlefield has become — from the apps on our phones to the cars we drive. Don't keep this knowledge to yourself: share this bulletin to protect others, and add The Hacker News to your Google News list so you never miss the updates that could make the difference. Claude Now Finds Your Bugs Anthropic Touts Safety Protections Built Into Claude Sonnet 4.6 Anthropic said it has rolled out a number of safety and security improve...
New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

Jun 12, 2025 AI Jailbreaking / Prompt Injection
Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails with just a single character change. "The TokenBreak attack targets a text classification model's tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent," Kieran Evans, Kasimir Schulz, and Kenneth Yeung said in a report shared with The Hacker News. Tokenization is a fundamental step that LLMs use to break down raw text into their atomic units – i.e., tokens – which are common sequences of characters found in a set of text. To that end, the text input is converted into their numerical representation and fed to the model.  LLMs work by understanding the statistical relationships between these tokens, and produce the next token in a sequence of tokens. The output tokens are detokeni...
New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

Apr 29, 2025 Vulnerability / Artificial Intelligence
Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content. The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails . "Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content," the CERT Coordination Center (CERT/CC) said in an advisory released last week. The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request.  "The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts," CERT/CC added. Success...
cyber security

How to Remove Otter AI from Your Org

websiteNudge SecuritySaaS Security / Artificial Intelligence
AI notetakers like Otter AI spread fast and introduce a slew of data privacy risks. Learn how to find and remove viral notetakers.
cyber security

[Download Report] State of AI in the SOC 2025: What 280+ Security Leaders Say

websiteProphet SecurityAI SOC Analyst
SOC teams face alert overload. Download this report to learn how SOCs are using AI for faster and smarter triage, investigation, and response.
Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Oct 23, 2024 Artificial Intelligence / Vulnerability
Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average attack success rate (ASR) of 64.6% within three interaction turns. "Deceptive Delight is a multi-turn technique that engages large language models (LLM) in an interactive conversation, gradually bypassing their safety guardrails and eliciting them to generate unsafe or harmful content," Unit 42's Jay Chen and Royce Lu said. It's also a little different from multi-turn jailbreak (aka many-shot jailbreak) methods like Crescendo , wherein unsafe or restricted topics are sandwiched between innocuous instructions, as opposed to gradually leading the model to produce harmful outpu...
Expert Insights Articles Videos
Cybersecurity Resources