AI Agents and the Non‑Human Identity Crisis: How to Deploy AI More Securely at Scale

May 27, 2025The Hacker NewsArtificial Intelligence / Cloud Identity

Artificial intelligence is driving a massive shift in enterprise productivity, from GitHub Copilot's code completions to chatbots that mine internal knowledge bases for instant answers. Each new agent must authenticate to other services, quietly swelling the population of non‑human identities (NHIs) across corporate clouds.

That population is already overwhelming the enterprise: many companies now juggle at least 45 machine identities for every human user. Service accounts, CI/CD bots, containers, and AI agents all need secrets, most commonly in the form of API keys, tokens, or certificates, to connect securely to other systems to do their work. GitGuardian's State of Secrets Sprawl 2025 report reveals the cost of this sprawl: over 23.7 million secrets surfaced on public GitHub in 2024 alone. And instead of making the situation better, repositories with Copilot enabled the leak of secrets 40 percent more often.

NHIs Are Not People

Unlike human beings logging into systems, NHIs rarely have any policies to mandate rotation of credentials, tightly scope permissions, or decommission unused accounts. Left unmanaged, they weave a dense, opaque web of high‑risk connections that attackers can exploit long after anyone remembers those secrets exist.

The adoption of AI, especially large language models and retrieval-augmented generation (RAG), has dramatically increased the speed and volume at which this risk-inducing sprawl can occur.

Consider an internal support chatbot powered by an LLM. When asked how to connect to a development environment, the bot might retrieve a Confluence page containing valid credentials. The chatbot can unwittingly expose secrets to anyone who asks the right question, and the logs can easily leak this info to whoever has access. Worse yet, in this scenario, the LLM is telling your developers to use this plaintext credential. The security issues can stack up quickly.

The situation is not hopeless, though. In fact, if proper governance models are implemented around NHIs and secrets management, then developers can actually innovate and deploy faster.

Five Actionable Controls to Reduce AI‑Related NHI Risk

Organizations looking to control the risks of AI-driven NHIs should focus on these five actionable practices:

Audit and Clean Up Data Sources
Centralize Your Existing NHIs Management
Prevent Secrets Leaks In LLM Deployments
Improve Logging Security
Restrict AI Data Access

Let's take a closer look at each one of these areas.

Audit and Clean Up Data Sources

The first LLMs were bound only to the specific data sets they were trained on, making them novelties with limited capabilities. Retrieval-augmented generation (RAG) engineering changed this by allowing LLM to access additional data sources as needed. Unfortunately, if there are secrets present in these sources, the related identities are now at risk of being abused.

Data sources, including project management platform Jira, communication platforms like Slack, and knowledgebases such as Confluence, weren't built with AI or secrets in mind. If someone adds a plaintext API key, there are no safeguards to alert them that this is dangerous. A chatbot can easily become a secrets-leaking engine with the right prompting.

The only surefire way to prevent your LLM from leaking those internal secrets is to eliminate the secrets present or at least revoke any access they carry. An invalid credential carries no immediate risk from an attacker. Ideally, you can remove these instances of any secret altogether before your AI can ever retrieve it. Fortunately, there are tools and platforms, like GitGuardian, that can make this process as painless as possible.

Centralize Your Existing NHIs Management

The quote "If you can not measure it, you can not improve it" is most often attributed to Lord Kelvin. This holds very true for non-human identity governance. Without taking stock of all the service accounts, bots, agents, and pipelines you currently have, there is little hope that you can apply effective rules and scopes around new NHIs associated with your agentic AI.

The one thing all those types of non-human identities have in common is that they all have a secret. No matter how you define NHI, we all define authentication mechanisms the same way: the secret. When we focus our inventories through this lens, we can collapse our focus to the proper storage and management of secrets, which is far from a new concern.

There are plenty of tools that can make this achievable, like HashiCorp Vault, CyberArk, or AWS Secrets Manager. Once they are all centrally managed and accounted for, then we can move from a world of long-lived credentials towards one where rotation is automated and enforced by policy.

Prevent Secrets Leaks In LLM Deployments

Model Context Protocol (MCP) servers are the new standard for how agentic AI is accessing services and data sources. Previously, if you wanted to configure an AI system to access a resource, you would need to wire it together yourself, figuring it out as you go. MCP introduced the protocol that AI can connect to the service provider with a standardized interface. This simplifies things and lessens the chance that a developer will hardcode a credential to get the integration working.

In one of the more alarming papers the GitGuardian security researchers have released, they found that 5.2% of all MCP servers they could find contained at least one hardcoded secret. This is notably higher than the 4.6% occurrence rate of exposed secrets observed in all public repositories.

Just like with any other technology you deploy, an ounce of safeguards early in the software development lifecycle can prevent a pound of incidents later on. Catching a hardcoded secret when it is still in a feature branch means it can never be merged and shipped to production. Adding secrets detection to the developer workflow via Git hooks or code editor extensions can mean the plaintext credentials never even make it to the shared repos.

Improve Logging Security

LLMs are black boxes that take requests and give probabilistic answers. While we can't tune the underlying vectorization, we can tell them if the output is as expected. AI engineers and machine learning teams log everything from the initial prompt, the retrieved context, and the generated response to tune the system in order to improve their AI agents.

If a secret is exposed in any one of those logged steps in the process, now you've got multiple copies of the same leaked secret, most likely in a third-party tool or platform. Most teams store logs in cloud buckets without tunable security controls.

The safest path is to add a sanitization step before the logs are stored or shipped to a third party. This does take some engineering effort to set up, but again, tools like GitGuardian's ggshield are here to help with secrets scanning that can be invoked programmatically from any script. If the secret is scrubbed, the risk is greatly reduced.

Restrict AI Data Access

Should your LLM have access to your CRM? This is a tricky question and highly situational. If it is an internal sales tool locked down behind SSO that can quickly search notes to improve delivery, it might be OK. For a customer service chatbot on the front page of your website, the answer is a firm no.

Just like we need to follow the principle of least privilege when setting permissions, we must apply a similar principle of least access for any AI we deploy. The temptation to just grant an AI agent full access to everything in the name of speeding things along is very great, as we don't want to box in our ability to innovate too early. Granting too little access defeats the purpose of RAG models. Granting too much access invites abuse and a security incident.

Raise Developer Awareness

While not on the list we started from, all of this guidance is useless unless you get it to the right people. The folks on the front line need guidance and guardrails to help them work more efficiently and safely. While we wish there were a magic tech solution to offer here, the truth is that building and deploying AI safely at scale still requires humans getting on the same page with the right processes and policies.

If you are on the development side of the world, we encourage you to share this article with your security team and get their take on how to securely build AI in your organization. If you are a security professional reading this, we invite you to share this with your developers and DevOps teams to further the conversation that AI is here, and we need to be safe as we build it and build with it.

Securing Machine Identity Equals Safer AI Deployments

The next phase of AI adoption will belong to organizations that treat non-human identities with the same rigor and care as they do human users. Continuous monitoring, lifecycle management, and robust secrets governance must become standard operating procedure. By building a secure foundation now, enterprises can confidently scale their AI initiatives and unlock the full promise of intelligent automation, without sacrificing security.

Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter and LinkedIn to read more exclusive content we post.

AI Agents and the Non‑Human Identity Crisis: How to Deploy AI More Securely at Scale

NHIs Are Not People

Audit and Clean Up Data Sources