A critical security flaw has been disclosed in the NVIDIA Container Toolkit that, if successfully exploited, could allow threat actors to break out of the confines of a container and gain full access to the underlying host.
The vulnerability, tracked as CVE-2024-0132, carries a CVSS score of 9.0 out of a maximum of 10.0. It has been addressed in NVIDIA Container Toolkit version v1.16.2 and NVIDIA GPU Operator version 24.6.2.
"NVIDIA Container Toolkit 1.16.1 or earlier contains a Time-of-Check Time-of-Use (TOCTOU) vulnerability when used with default configuration where a specifically crafted container image may gain access to the host file system," NVIDIA said in an advisory.
"A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering."
The issue impacts all versions of NVIDIA Container Toolkit up to and including v1.16.1, and Nvidia GPU Operator up to and including 24.6.1. However, it does not affect use cases where Container Device Interface (CDI) is used.
Cloud security firm Wiz, which discovered and reported the flaw to NVIDIA on September 1, 2024, said it could allow an attacker who controls the container images run by the Toolkit to perform a container escape and gain full access to the underlying host.
In an hypothetical attack scenario, a threat actor could weaponize the shortcoming by creating a rogue container image that, when run on the target platform either directly or indirectly, grants them full access to the file system.
This could materialize in the form of a supply chain attack where the victim is tricked into running the malicious image, or, alternatively, via services that allow shared GPU resources.
"With this access, the attacker can now reach the Container Runtime Unix sockets (docker.sock/containerd.sock)," security researchers Shir Tamari, Ronen Shustin, and Andres Riancho said.
"These sockets can be used to execute arbitrary commands on the host system with root privileges, effectively taking control of the machine."
The problem poses a severe risk to orchestrated, multi-tenant environments, as it could permit an attacker to escape the container and obtain access to data and secrets of other applications running on the same node, and even the same cluster.
Technical aspects of the attack have been withheld at this stage to prevent exploitation efforts. It's highly recommended that users take steps to apply the patches to safeguard against potential threats.
"While the hype concerning AI security risks tends to focus on futuristic AI-based attacks, 'old-school' infrastructure vulnerabilities in the ever-growing AI tech stack remain the immediate risk that security teams should prioritize and protect against," the researchers said.