Google said it discovered a zero-day vulnerability in the SQLite open-source database engine using its large language model (LLM) assisted framework called Big Sleep (formerly Project Naptime).
The tech giant described the development as the "first real-world vulnerability" uncovered using the artificial intelligence (AI) agent.
"We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software," the Big Sleep team said in a blog post shared with The Hacker News.
The vulnerability in question is a stack buffer underflow in SQLite, which occurs when a piece of software references a memory location prior to the beginning of the memory buffer, thereby resulting in a crash or arbitrary code execution.
"This typically occurs when a pointer or its index is decremented to a position before the buffer, when pointer arithmetic results in a position before the beginning of the valid memory location, or when a negative index is used," according to a Common Weakness Enumeration (CWE) description of the bug class.
Following responsible disclosure, the shortcoming has been addressed as of early October 2024. It's worth noting that the flaw was discovered in a development branch of the library, meaning it was flagged before it made it into an official release.
Project Naptime was first detailed by Google in June 2024 as a technical framework to improve automated vulnerability discovery approaches. It has since evolved into Big Sleep, as part of a broader collaboration between Google Project Zero and Google DeepMind.
With Big Sleep, the idea is to leverage an AI agent to simulate human behavior when identifying and demonstrating security vulnerabilities by taking advantage of an LLM's code comprehension and reasoning abilities.
This entails using a suite of specialized tools that allow the agent to navigate through the target codebase, run Python scripts in a sandboxed environment to generate inputs for fuzzing, and debug the program and observe results.
"We think that this work has tremendous defensive potential. Finding vulnerabilities in software before it's even released, means that there's no scope for attackers to compete: the vulnerabilities are fixed before attackers even have a chance to use them," Google said.
The company, however, also emphasized that these are still experimental results, adding "the position of the Big Sleep team is that at present, it's likely that a target-specific fuzzer would be at least as effective (at finding vulnerabilities)."