Generative AI Data Leakage

As the adoption of generative AI tools, like ChatGPT, continues to surge, so does the risk of data exposure. According to Gartner's "Emerging Tech: Top 4 Security Risks of GenAI" report, privacy and data security is one of the four major emerging risks within generative AI. A new webinar featuring a multi-time Fortune 100 CISO and the CEO of LayerX, a browser extension solution, delves into this critical risk.

Throughout the webinar, the speakers will explain why data security is a risk and explore the ability of DLP solutions to protect against them, or lack thereof. Then, they will delineate the capabilities required by DLP solutions to ensure businesses benefit from the productivity GenAI applications have to offer without compromising security.

The Business and Security Risks of Generative AI Applications

GenAI security risks occur when employees insert sensitive texts into these applications. These actions warrant careful consideration, because the inserted data becomes part of the AI's training set. This means that the AI algorithms learn from this data, incorporating it into its algorithms for generating future responses.

There are two main dangers that stem from this behavior. First, the immediate risk of data leakage. The sensitive information might be exposed in a response generated by the application to a query from another user. Imagine a scenario where an employee pastes proprietary code into a generative AI for analysis. Later, a different user might receive that snippet of that code as part of a generated response, compromising its confidentiality.

Second, there's a longer-term risk concerning data retention, compliance, and governance. Even if the data isn't immediately exposed, it may be stored in the AI's training set for an indefinite period. This raises questions about how securely the data is stored, who has access to it, and what measures are in place to ensure it doesn't get exposed in the future.

44% Increase in GenAI Usage

There are a number of sensitive data types that are at risk of being leaked. The main ones are leakage of business financial information, source code, business plans, and PII. These could result in irreparable harm to the business strategy, loss of internal IP, breaching third party confidentiality, and a violation of customer privacy, which could eventually lead to brand degradation and legal implications.

The data sides with the concern. Research conducted by LayerX on their own user data shows that employee usage of generative AI applications has increased by 44% throughout 2023, with 6% of employees pasting sensitive data into these applications, 4% on a weekly basis!

Where DLP Solutions Fail to Deliver

Traditionally, DLP solutions were designed to protect against data leakage. These tools, which became the cornerstone of cybersecurity strategies over the years, safeguard sensitive data from unauthorized access and transfers. DLP solutions are particularly effective when dealing with data files like documents, spreadsheets, or PDFs. They can monitor the flow of these files across a network and flag or block any unauthorized attempts to move or share them.

However, the landscape of data security is evolving, and so are the methods of data leakage. One area where traditional DLP solutions fall short is in controlling text pasting. Text-based data can be copied and pasted across different platforms without triggering the same security protocols. Consequently, traditional DLP solutions are not designed to analyze or block the pasting of sensitive text into generative AI applications.

Moreover, CASB DLP solutions, a subset of DLP technologies, have their own limitations. They are generally effective only for sanctioned applications within an organization's network. This means that if an employee were to paste sensitive text into an unsanctioned AI application, the CASB DLP would likely not detect or prevent this action, leaving the organization vulnerable.

The Solution: A GenAI DLP

The solution is a generative AI DLP or a Web DLP. Generative AI DLP can continuously monitor text pasting actions across various platforms and applications. It uses ML algorithms to analyze the text in real-time, identifying patterns or keywords that might indicate sensitive information. Once such data is detected, the system can take immediate actions such as issuing warnings, blocking access, or even preventing the pasting action altogether. This level of granularity in monitoring and response is something that traditional DLP solutions cannot offer.

Web DLP solutions go the extra mile and can identify any data-related actions to and from web locations. Through advanced analytics, the system can differentiate between safe and unsafe web locations and even managed and unmanaged devices. This level of sophistication allows organizations to better protect their data and ensure that it is being accessed and used in a secure manner. This also helps organizations comply with regulations and industry standards.

What does Gartner have to say about DLP? How often do employees visit generative AI applications? What does a GenAI DLP solution look like? Find out the answers and more by signing up to the webinar, here.


Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter and LinkedIn to read more exclusive content we post.