#1 Trusted Cybersecurity News Platform
Followed by 5.20+ million
The Hacker News Logo
Subscribe – Get Latest News
Salesforce Security Handbook

AI Safety | Breaking Cybersecurity News | The Hacker News

Category — AI Safety
OpenAI Disrupts Russian, North Korean, and Chinese Hackers Misusing ChatGPT for Cyberattacks

OpenAI Disrupts Russian, North Korean, and Chinese Hackers Misusing ChatGPT for Cyberattacks

Oct 08, 2025 Artificial Intelligence / Threat Intelligence
OpenAI on Tuesday said it disrupted three activity clusters for misusing its ChatGPT artificial intelligence (AI) tool to facilitate malware development. This includes a Russian‑language threat actor, who is said to have used the chatbot to help develop and refine a remote access trojan (RAT), a credential stealer with an aim to evade detection. The operator also used several ChatGPT accounts to prototype and troubleshoot technical components that enable post‑exploitation and credential theft. "These accounts appear to be affiliated with Russian-speaking criminal groups, as we observed them posting evidence of their activities in a Telegram channel dedicated to those actors," OpenAI said. The AI company said while its large language models (LLMs) refused the threat actor's direct requests to produce malicious content, they worked around the limitation by creating building-block code, which was then assembled to create the workflows. Some of the produced output invo...
ThreatsDay Bulletin: CarPlay Exploit, BYOVD Tactics, SQL C2 Attacks, iCloud Backdoor Demand & More

ThreatsDay Bulletin: CarPlay Exploit, BYOVD Tactics, SQL C2 Attacks, iCloud Backdoor Demand & More

Oct 02, 2025 Threat Intelligence / Cyber Attacks
From unpatched cars to hijacked clouds, this week's Threatsday headlines remind us of one thing — no corner of technology is safe. Attackers are scanning firewalls for critical flaws, bending vulnerable SQL servers into powerful command centers, and even finding ways to poison Chrome's settings to sneak in malicious extensions. On the defense side, AI is stepping up to block ransomware in real time, but privacy fights over data access and surveillance are heating up just as fast. It's a week that shows how wide the battlefield has become — from the apps on our phones to the cars we drive. Don't keep this knowledge to yourself: share this bulletin to protect others, and add The Hacker News to your Google News list so you never miss the updates that could make the difference. Claude Now Finds Your Bugs Anthropic Touts Safety Protections Built Into Claude Sonnet 4.6 Anthropic said it has rolled out a number of safety and security improve...
New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

Jun 12, 2025 AI Jailbreaking / Prompt Injection
Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails with just a single character change. "The TokenBreak attack targets a text classification model's tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent," Kieran Evans, Kasimir Schulz, and Kenneth Yeung said in a report shared with The Hacker News. Tokenization is a fundamental step that LLMs use to break down raw text into their atomic units – i.e., tokens – which are common sequences of characters found in a set of text. To that end, the text input is converted into their numerical representation and fed to the model.  LLMs work by understanding the statistical relationships between these tokens, and produce the next token in a sequence of tokens. The output tokens are detokeni...
cyber security

New Webinar: Analyzing Real-world ClickFix Attacks

websitePush SecurityBrowser Security / Threat Detection
Learn how ClickFix-style attacks are bypassing detection controls, and what security teams can do about it.
cyber security

Weaponized GenAI + Extortion-First Strategies Fueling a New Age of Ransomware

websiteZscalerRansomware / Endpoint Security
Trends and insights based on expert analysis of public leak sites, ransomware samples and attack data.
New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

Apr 29, 2025 Vulnerability / Artificial Intelligence
Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content. The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails . "Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content," the CERT Coordination Center (CERT/CC) said in an advisory released last week. The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request.  "The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts," CERT/CC added. Success...
Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Oct 23, 2024 Artificial Intelligence / Vulnerability
Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average attack success rate (ASR) of 64.6% within three interaction turns. "Deceptive Delight is a multi-turn technique that engages large language models (LLM) in an interactive conversation, gradually bypassing their safety guardrails and eliciting them to generate unsafe or harmful content," Unit 42's Jay Chen and Royce Lu said. It's also a little different from multi-turn jailbreak (aka many-shot jailbreak) methods like Crescendo , wherein unsafe or restricted topics are sandwiched between innocuous instructions, as opposed to gradually leading the model to produce harmful outpu...
Expert Insights Articles Videos
Cybersecurity Resources