Microsoft AI Researchers Accidentally Expose 38 Terabytes of Confidential Data

Ravie LakshmananSep 19, 2023Data Safety / Cybersecurity

Microsoft on Monday said it took steps to correct a glaring security gaffe that led to the exposure of 38 terabytes of private data.

The leak was discovered on the company's AI GitHub repository and is said to have been inadvertently made public when publishing a bucket of open-source training data, Wiz said. It also included a disk backup of two former employees' workstations containing secrets, keys, passwords, and over 30,000 internal Teams messages.

The repository, named "robust-models-transfer," is no longer accessible. Prior to its takedown, it featured source code and machine learning models pertaining to a 2020 research paper titled "Do Adversarially Robust ImageNet Models Transfer Better?"

"The exposure came as the result of an overly permissive SAS token – an Azure feature that allows users to share data in a manner that is both hard to track and hard to revoke," Wiz said in a report. The issue was reported to Microsoft on June 22, 2023.

Specifically, the repository's README.md file instructed developers to download the models from an Azure Storage URL that accidentally also granted access to the entire storage account, thereby exposing additional private data.

"In addition to the overly permissive access scope, the token was also misconfigured to allow "full control" permissions instead of read-only," Wiz researchers Hillai Ben-Sasson and Ronny Greenberg said. "Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well."

In response to the findings, Microsoft said its investigation found no evidence of unauthorized exposure of customer data and that "no other internal services were put at risk because of this issue." It also emphasized that customers need not take any action on their part.

The Windows makers further noted that it revoked the SAS token and blocked all external access to the storage account. The problem was resolved two days after responsible disclosure.

To mitigate such risks going forward, the company has expanded its secret scanning service to include any SAS token that may have overly permissive expirations or privileges. It said it also identified a bug in its scanning system that flagged the specific SAS URL in the repository as a false positive.

"Due to the lack of security and governance over Account SAS tokens, they should be considered as sensitive as the account key itself," the researchers said. "Therefore, it is highly recommended to avoid using Account SAS for external sharing. Token creation mistakes can easily go unnoticed and expose sensitive data."

This is not the first time misconfigured Azure storage accounts have come to light. In July 2022, JUMPSEC Labs highlighted a scenario in which a threat actor could take advantage of such accounts to gain access to an enterprise on-premise environment.

The development is the latest security blunder at Microsoft and comes nearly two weeks after the company revealed that hackers based in China were able to infiltrate the company's systems and steal a highly sensitive signing key by compromising an engineer's corporate account and likely accessing a crash dump of the consumer signing system.

"AI unlocks huge potential for tech companies. However, as data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards," Wiz CTO and co-founder Ami Luttwak said in a statement.

"This emerging technology requires large sets of data to train on. With many development teams needing to manipulate massive amounts of data, share it with their peers or collaborate on public open-source projects, cases like Microsoft's are increasingly hard to monitor and avoid."

Found this article interesting? Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.