Researchers Find ChatGPT Vulnerabilities That Let Attackers Trick AI Into Leaking Data

Nov 05, 2025Ravie LakshmananArtificial Intelligence / Vulnerability

Cybersecurity researchers have disclosed a new set of vulnerabilities impacting OpenAI's ChatGPT artificial intelligence (AI) chatbot that could be exploited by an attacker to steal personal information from users' memories and chat histories without their knowledge.

The seven vulnerabilities and attack techniques, according to Tenable, were found in OpenAI's GPT-4o and GPT-5 models. OpenAI has since addressed some of them.

These issues expose the AI system to indirect prompt injection attacks, allowing an attacker to manipulate the expected behavior of a large language model (LLM) and trick it into performing unintended or malicious actions, security researchers Moshe Bernstein and Liv Matan said in a report shared with The Hacker News.

The identified shortcomings are listed below -

Indirect prompt injection vulnerability via trusted sites in Browsing Context, which involves asking ChatGPT to summarize the contents of web pages with malicious instructions added in the comment section, causing the LLM to execute them
Zero-click indirect prompt injection vulnerability in Search Context, which involves tricking the LLM into executing malicious instructions simply by asking about a niche website in the form of a natural language query, owing to the fact that the site may have been indexed by search engines like Bing and OpenAI's crawler associated with SearchGPT.
Prompt injection vulnerability via one-click, which involves crafting a link in the format "chatgpt[.]com/?q={Prompt}," causing the LLM to automatically execute the query in the "q=" parameter when the URL is clicked
Safety mechanism bypass vulnerability, which takes advantage of the fact that the domain bing[.]com is allow-listed in ChatGPT as a safe URL to set up Bing ad tracking links (bing[.]com/ck/a) to mask malicious URLs and allow them to be rendered on the chat.
Conversation injection technique, which involves inserting malicious instructions into a website and asking ChatGPT to summarize the website, causing the LLM to respond to subsequent interactions with unintended replies due to the prompt being placed within the conversational context (i.e., the output from SearchGPT)
Malicious content hiding technique, which involves hiding malicious prompts by taking advantage of a bug resulting from how ChatGPT renders markdown that causes any data appearing on the same line denoting a fenced code block opening (```) after the first word to not be rendered
Memory injection technique, which involves poisoning a user's ChatGPT memory by concealing hidden instructions in a website and asking the LLM to summarize the site

The disclosure comes close on the heels of research demonstrating various kinds of prompt injection attacks against AI tools that are capable of bypassing safety and security guardrails -

A technique called PromptJacking that exploits three remote code execution vulnerabilities in Anthropic Claude's Chrome, iMessage, and Apple Notes connectors to achieve unsanitized command injection, resulting in prompt injection
A technique called Claude pirate that abuses Claude's Files API for data exfiltration by using indirect prompt injections that weaponize an oversight in Claude's network access controls
A technique called agent session smuggling that leverages the Agent2Agent (A2A) protocol and allows a malicious AI agent to exploit an established cross-agent communication session to inject additional instructions between a legitimate client request and the server's response, resulting in context poisoning, data exfiltration, or unauthorized tool execution
A technique called prompt inception that employs prompt injections to steer an AI agent to amplify bias or falsehoods, leading to disinformation at scale
A zero-click attack called shadow escape that can be used to steal sensitive data from interconnected systems by leveraging standard Model Context Protocol (MCP) setups and default MCP permissioning through specially crafted documents containing "shadow instructions" that trigger the behavior when uploaded to AI chatbots
An indirect prompt injection targeting Microsoft 365 Copilot that abuses the tool's built-in support for Mermaid diagrams for data exfiltration by taking advantage of its support for CSS
A vulnerability in GitHub Copilot Chat called CamoLeak (CVSS score: 9.6) that allows for covert exfiltration of secrets and source code from private repositories and full control over Copilot's responses by combining a Content Security Policy (CSP) bypass and remote prompt injection using hidden comments in pull requests
A white-box jailbreak attack called LatentBreak that generates natural adversarial prompts with low perplexity, which are capable of evading safety mechanisms by substituting words in the input prompt with semantically-equivalent ones and preserving the initial intent of the prompt

The findings show that exposing AI chatbots to external tools and systems, a key requirement for building AI agents, expands the attack surface by presenting more avenues for threat actors to conceal malicious prompts that end up being parsed by models.

Attack methods such as zero-click ChatGPT data exfiltration via indirect prompt injection also highlight the fundamental problem with LLMs' inability to distinguish between legitimate user instructions and attacker-controlled data ingested from external sources.

"Prompt injection is a known issue with the way that LLMs work, and, unfortunately, it will probably not be fixed systematically in the near future," Tenable researchers said. "AI vendors should take care to ensure that all of their safety mechanisms (such as url_safe) are working properly to limit the potential damage caused by prompt injection."

The development comes as a group of academics from Texas A&M, the University of Texas, and Purdue University found that training AI models on "junk data" can lead to LLM "brain rot," warning "heavily relying on Internet data leads LLM pre-training to the trap of content contamination."

Last month, a study from Anthropic, the U.K. AI Security Institute, and the Alan Turing Institute also discovered that it's possible to successfully backdoor AI models of different sizes (600M, 2B, 7B, and 13B parameters) using just 250 poisoned documents, upending previous assumptions that attackers needed to obtain control of a certain percentage of training data in order to tamper with a model's behavior.

From an attack standpoint, malicious actors could attempt to poison web content that's scraped for training LLMs, or they could create and distribute their own poisoned versions of open-source models.

"If attackers only need to inject a fixed, small number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously believed," Anthropic said. "Creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers."

And that's not all. Another research from Stanford University scientists found that optimizing LLMs for competitive success in sales, elections, and social media can inadvertently drive misalignment, a phenomenon referred to as Moloch's Bargain.

"In line with market incentives, this procedure produces agents achieving higher sales, larger voter shares, and greater engagement," researchers Batu El and James Zou wrote in an accompanying paper published last month.

"However, the same procedure also introduces critical safety concerns, such as deceptive product representation in sales pitches and fabricated information in social media posts, as a byproduct. Consequently, when left unchecked, market competition risks turning into a race to the bottom: the agent improves performance at the expense of safety."

Found this article interesting? Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.