AI Accidents

Last year, the Open Worldwide Application Security Project (OWASP) published multiple versions of the "OWASP Top 10 For Large Language Models," reaching a 1.0 document in August and a 1.1 document in October. These documents not only demonstrate the rapidly evolving nature of Large Language Models, but the evolving ways in which they can be attacked and defended. We're going to talk in this article about four items in that top 10 that are most able to contribute to the accidental disclosure of secrets such as passwords, API keys, and more.

We're already aware that LLMs can reveal secrets because it's happened. In early 2023, GitGuardian reported it found over 10 million secrets in public Github commits. Github's Copilot AI coding tool was trained on public commits, and in September of 2023, researchers at the University of Hong Kong published a paper on how they created an algorithm that generated 900 prompts designed to get Copilot to reveal secrets from its training data. When these prompts were used, Copilot revealed over 2,700 valid secrets.

The technique used by the researchers is called "prompt injection." It is #1 in the OWASP Top 10 for LLMs and they describe it as follows: [blockquote]

"This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources."

You may be more familiar with prompt injection from the bug revealed last year that was getting ChatGPT to start spitting out training data if you asked it to repeat certain words forever.

Tip 1: Rotate your secrets

Even if you don't think you accidentally published secrets to GitHub, a number of the secrets in there were committed in an early commit and clobbered in a newer commit, so they're not readily apparent without reviewing your entire commit history, not just the current state of your public repositories.

A tool from GitGuardian, called Has My Secret Leaked, lets you hash encrypt a current secret, then submit the first few characters of the hash to determine if there are any matches in their database of what they find in their scans of GitHub. A positive match isn't a guarantee your secret leaked, but provides a potential likelihood that it did so you can investigate further.

Caveats on key/password rotation are that you should know where they're being used, what might break when they change, and have a plan to mitigate that breakage while the new secrets propagate out to the systems that need them. Once rotated, you must ensure the older secrets have been disabled.

Attackers can't use a secret that no longer works and if the secrets of yours that might be in an LLM have been rotated, then they become nothing but useless high-entropy strings.

Tip 2: Clean your data

Item #6 in the OWASP Top 10 for LLMs is "Sensitive Information Disclosure":

LLMs may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It's crucial to implement data sanitization and strict user policies to mitigate this.

While deliberately engineered prompts can cause LLMs to reveal sensitive data, they can do so accidentally as well. The best way to ensure the LLM isn't revealing sensitive data is to ensure the LLM never knows it.

This is more focused on when you're training an LLM for use by people who might not always have your best interests at heart or people who simply should not have access to certain information. Whether it's your secrets or secret sauce, only those who need access to them should have it… and your LLM is likely not one of those people.

Using open-source tools or paid services to scan your training data for secrets BEFORE feeding the data to your LLM will help you remove the secrets. What your LLM doesn't know, it can't tell.

Tip 3: Patch Regularly & Limit Privileges

Recently we saw a piece on using .env files and environment variables as a way to keep secrets available to your code, but out of your code. But what if your LLM could be asked to reveal environment variables… or do something worse?

This blends both Item #2 ("Insecure Output Handling") and item #8 ("Excessive Agency").

  • Insecure Output Handling: This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.
  • Excessive Agency: LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.

It's hard to extricate them from each other because they can make each other worse. If an LLM can be tricked into doing something and its operating context has unnecessary privileges, the potential of an arbitrary code execution to do major harm multiplies.

Every developer has seen the "Exploits of a Mom" cartoon where a boy named `Robert"); DROP TABLE Students;"` wipes out a school's student database. Though an LLM seems smart, it's really no smarter than an SQL database. And like your "comedian" brother getting your toddler nephew to repeat bad words to Grandma, bad inputs can create bad outputs. Both should be sanitized and considered untrustworthy.

Furthermore, you need to set up guardrails around what the LLM or app can do, considering the principle of least privilege. Essentially, the apps that use or enable the LLM and the LLM infrastructure should not have access to any data or functionality they do not absolutely need so they can't accidentally put it in the service of a hacker.

AI can still be considered to be in its infancy, and as with any baby, it should not be given freedom to roam in any room you haven't baby-proofed. LLMs can misunderstand, hallucinate, and be deliberately led astray. When that happens, good locks, good walls, and good filters should help prevent them from accessing or revealing secrets.

In Summary

Large language models are an amazing tool. They're set to revolutionize a number of professions, processes, and industries. But they are far from a mature technology, and many are adopting them recklessly out of the fear of being left behind.

As you would with any baby that's developed enough mobility to get itself into trouble, you have to keep an eye on it and lock any cabinets you don't want it getting into. Proceed with large language models, but proceed with caution.


Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter and LinkedIn to read more exclusive content we post.