#1 Trusted Cybersecurity News Platform
Followed by 5.20+ million
The Hacker News Logo
Subscribe – Get Latest News

Search results for how to jailbreak AI | Breaking Cybersecurity News | The Hacker News

Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems

Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems

Aug 09, 2025 Generative AI / Vulnerability
Cybersecurity researchers have uncovered a jailbreak technique to bypass ethical guardrails erected by OpenAI in its latest large language model (LLM) GPT-5 and produce illicit instructions. Generative artificial intelligence (AI) security platform NeuralTrust said it combined a known technique called Echo Chamber with narrative-driven steering to trick the model into producing undesirable responses. "We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling," security researcher Martí Jordà said . "This combination nudges the model toward the objective while minimizing triggerable refusal cues." Echo Chamber is a jailbreak approach that was detailed by the company back in June 2025 as a way to deceive an LLM into generating responses to prohibited topics using indirect references, semantic steering, and multi-step inference. In recent weeks, the...
Prompt Injection Flaw in Vanna AI Exposes Databases to RCE Attacks

Prompt Injection Flaw in Vanna AI Exposes Databases to RCE Attacks

Jun 27, 2024 Artificial Intelligence / Vulnerability
Cybersecurity researchers have disclosed a high-severity security flaw in the Vanna.AI library that could be exploited to achieve remote code execution vulnerability via prompt injection techniques. The vulnerability, tracked as CVE-2024-5565 (CVSS score: 8.1), relates to a case of prompt injection in the "ask" function that could be exploited to trick the library into executing arbitrary commands, supply chain security firm JFrog said . Vanna is a Python-based machine learning library that allows users to chat with their SQL database to glean insights by "just asking questions" (aka prompts) that are translated into an equivalent SQL query using a large language model (LLM). The rapid rollout of generative artificial intelligence (AI) models in recent years has brought to the fore the risks of exploitation by malicious actors, who can weaponize the tools by providing adversarial inputs that bypass the safety mechanisms built into them. One such prominent clas...
Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models

Oct 23, 2024 Artificial Intelligence / Vulnerability
Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average attack success rate (ASR) of 64.6% within three interaction turns. "Deceptive Delight is a multi-turn technique that engages large language models (LLM) in an interactive conversation, gradually bypassing their safety guardrails and eliciting them to generate unsafe or harmful content," Unit 42's Jay Chen and Royce Lu said. It's also a little different from multi-turn jailbreak (aka many-shot jailbreak) methods like Crescendo , wherein unsafe or restricted topics are sandwiched between innocuous instructions, as opposed to gradually leading the model to produce harmful outpu...
cyber security

SANS Cyber Defense Initiative 2025

websiteSANS InstituteCyber Defense / ICS Security
Strengthen your cybersecurity skills in Washington, DC or Live Online (ET), Dec 12–17, 2025.
cyber security

2025 Gartner® MQ Report for Endpoint Protection Platforms (July 2025 Edition)

websiteSentinelOneEndpoint Protection / Unified Security
Compare leading Endpoint Protection vendors and see why SentinelOne is named a 5x Leader.
⚡ Weekly Recap: BadCam Attack, WinRAR 0-Day, EDR Killer, NVIDIA Flaws, Ransomware Attacks & More

⚡ Weekly Recap: BadCam Attack, WinRAR 0-Day, EDR Killer, NVIDIA Flaws, Ransomware Attacks & More

Aug 11, 2025
This week, cyber attackers are moving quickly, and businesses need to stay alert. They're finding new weaknesses in popular software and coming up with clever ways to get around security. Even one unpatched flaw could let attackers in, leading to data theft or even taking control of your systems. The clock is ticking—if defenses aren't updated regularly, it could lead to serious damage. The message is clear: don't wait for an attack to happen. Take action now to protect your business. Here's a look at some of the biggest stories in cybersecurity this week: from new flaws in WinRAR and NVIDIA Triton to advanced attack techniques you should know about. Let's get into the details. ⚡ Threat of the Week Trend Micro Warns of Actively Exploited 0-Day — Trend Micro has released temporary mitigations to address critical security flaws in on-premise versions of Apex One Management Console that it said have been exploited in the wild. The vulnerabilities (CVE-2025-54948 and CVE-2025-54987),...
⚡ THN Weekly Recap: Top Cybersecurity Threats, Tools and Tips [3 February]

⚡ THN Weekly Recap: Top Cybersecurity Threats, Tools and Tips [3 February]

Feb 03, 2025 Cybersecurity / Recap
This week, our news radar shows that every new tech idea comes with its own challenges. A hot AI tool is under close watch, law enforcement is shutting down online spots that help cybercriminals, and teams are busy fixing software bugs that could let attackers in. From better locks on our devices to stopping sneaky tricks online, simple steps are making a big difference.  Let's take a closer look at how these efforts are shaping a safer digital world. ⚡ Threat of the Week DeepSeek's Popularity Invites Scrutiny — The overnight popularity of DeepSeek, an artificial intelligence (AI) platform originating from China, has led to extensive scrutiny of its models, with several analyses finding ways to jailbreak its system and produce malicious or prohibited content. While jailbreaks and prompt injections are a persistent concern in mainstream AI products, the findings also show that the model lacks enough protections to prevent potential abuse by malicious actors . The AI chatbot ha...
Lovable AI Found Most Vulnerable to VibeScamming — Enabling Anyone to Build Live Scam Pages

Lovable AI Found Most Vulnerable to VibeScamming — Enabling Anyone to Build Live Scam Pages

Apr 09, 2025 Artificial Intelligence / Web Security
Lovable , a generative artificial intelligence (AI) powered platform that allows for creating full-stack web applications using text-based prompts, has been found to be the most susceptible to jailbreak attacks, allowing novice and aspiring cybercrooks to set up lookalike credential harvesting pages. "As a purpose-built tool for creating and deploying web apps, its capabilities line up perfectly with every scammer's wishlist," Guardio Labs' Nati Tal said in a report shared with The Hacker News. "From pixel-perfect scam pages to live hosting, evasion techniques, and even admin dashboards to track stolen data – Lovable didn't just participate, it performed. No guardrails, no hesitation." The technique has been codenamed VibeScamming – a play on the term vibe coding, which refers to an AI-dependent programming technique to produce software by describing the problem statement in a few sentences as a prompt to a large language model (LLM) tuned for codin...
Apple Opens PCC Source Code for Researchers to Identify Bugs in Cloud AI Security

Apple Opens PCC Source Code for Researchers to Identify Bugs in Cloud AI Security

Oct 25, 2024 Cloud Security / Artificial Intelligence
Apple has publicly made available its Private Cloud Compute (PCC) Virtual Research Environment (VRE), allowing the research community to inspect and verify the privacy and security guarantees of its offering. PCC, which Apple unveiled earlier this June, has been marketed as the "most advanced security architecture ever deployed for cloud AI compute at scale." With the new technology, the idea is to offload computationally complex Apple Intelligence requests to the cloud in a manner that doesn't sacrifice user privacy. Apple said it's inviting "all security and privacy researchers — or anyone with interest and a technical curiosity — to learn more about PCC and perform their own independent verification of our claims." To further incentivize research, the iPhone maker said it's expanding the Apple Security Bounty program to include PCC by offering monetary payouts ranging from $50,000 to $1,000,000 for security vulnerabilities identified in it. Th...
New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

Apr 29, 2025 Vulnerability / Artificial Intelligence
Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content. The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails . "Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content," the CERT Coordination Center (CERT/CC) said in an advisory released last week. The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request.  "The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts," CERT/CC added. Success...
Cursor AI Code Editor Vulnerability Enables RCE via Malicious MCP File Swaps Post Approval

Cursor AI Code Editor Vulnerability Enables RCE via Malicious MCP File Swaps Post Approval

Aug 05, 2025 AI Security / MCP Protocol
Cybersecurity researchers have disclosed a high-severity security flaw in the artificial intelligence (AI)-powered code editor Cursor that could result in remote code execution. The vulnerability, tracked as CVE-2025-54136 (CVSS score: 7.2), has been codenamed MCPoison by Check Point Research, owing to the fact that it exploits a quirk in the way the software handles modifications to Model Context Protocol (MCP) server configurations. "A vulnerability in Cursor AI allows an attacker to achieve remote and persistent code execution by modifying an already trusted MCP configuration file inside a shared GitHub repository or editing the file locally on the target's machine," Cursor said in an advisory released last week. "Once a collaborator accepts a harmless MCP, the attacker can silently swap it for a malicious command (e.g., calc.exe) without triggering any warning or re-prompt." MCP is an open-standard developed by Anthropic that allows large language mode...
GitLab Duo Vulnerability Enabled Attackers to Hijack AI Responses with Hidden Prompts

GitLab Duo Vulnerability Enabled Attackers to Hijack AI Responses with Hidden Prompts

May 23, 2025 Artificial Intelligence / Vulnerability
Cybersecurity researchers have discovered an indirect prompt injection flaw in GitLab's artificial intelligence (AI) assistant Duo that could have allowed attackers to steal source code and inject untrusted HTML into its responses, which could then be used to direct victims to malicious websites. GitLab Duo is an artificial intelligence (AI)-powered coding assistant that enables users to write, review, and edit code. Built using Anthropic's Claude models, the service was first launched in June 2023. But as Legit Security found , GitLab Duo Chat has been susceptible to an indirect prompt injection flaw that permits attackers to "steal source code from private projects, manipulate code suggestions shown to other users, and even exfiltrate confidential, undisclosed zero-day vulnerabilities." Prompt injection refers to a class of vulnerabilities common in AI systems that enable threat actors to weaponize large language models (LLMs) to manipulate responses to user...
12,000+ API Keys and Passwords Found in Public Datasets Used for LLM Training

12,000+ API Keys and Passwords Found in Public Datasets Used for LLM Training

Feb 28, 2025 Machine Learning / Data Privacy
A dataset used to train large language models (LLMs) has been found to contain nearly 12,000 live secrets, which allow for successful authentication. The findings once again highlight how hard-coded credentials pose a severe security risk to users and organizations alike, not to mention compounding the problem when LLMs end up suggesting insecure coding practices to their users. Truffle Security said it downloaded a December 2024 archive from Common Crawl , which maintains a free, open repository of web crawl data. The massive dataset contains over 250 billion pages spanning 18 years.  The archive specifically contains 400TB of compressed web data, 90,000 WARC files (Web ARChive format), and data from 47.5 million hosts across 38.3 million registered domains. The company's analysis found that there are 219 different secret types in the Common Crawl archive, including Amazon Web Services (AWS) root keys, Slack webhooks, and Mailchimp API keys. "'Live' secrets ar...
Expert Insights Articles Videos
Cybersecurity Resources
//]]>