European Data Protection Board Report on AI Privacy Risks & Mitigations in Large Language Models

King & Spalding
Contact

Large Language Models (“LLMs”) are a subset of artificial intelligence (“AI”) which use a type of machine learning called deep learning in order to understand how characters, words, and sentences function together. The advent of LLMs represents one of the most disruptive transformations in the evolution of AI with LLMs being now employed across a wide range of contexts, from chatbots, content creation, to coding assistance and business process automation.

The development of these AI tools powered by LLMs – generally relying on vast amount of data, including personal data, as part of their training dataset – also raises significant concerns from a data protection law standpoint. On 10 April 2025, the European Data Protection Board (EDPB”) published a report outlining a risk management methodology for LLMs (“Report”).

The Report goes beyond a theoretical overview of the main categories of privacy risks – it proposes an operational methodology for identifying, assessing, and mitigating the risks associated with LLMs, adopting an approach grounded in the GDPR principles of data protection by design and by default, data minimization, and accountability. It also includes a systematic review of technical and organizational measures recommended to reduce exposure to risk and outlines the implications arising from the distribution of roles and responsibilities among the various actors involved in the AI system lifecycle.

The Report’s key themes as applied to the use of LLMs in the context of a chatbot are as follows:

Data Flow in LLM Systems

Understanding the data flow in AI systems powered by LLMs is crucial for assessing privacy risks and identifying the appropriate mitigation measures. The Report recommends that organizations have a clear overview of the possible architecture of the AI system at the early design and development stage, to better understand the data flows and potential risks associated with its development and deployment.

As a working example, the Report examines a chatbot assistance to illustrate the proposed risk assessment methodology. First, the expected data flow for the processing of personal data is considered and nine distinct stages are identified:

Stage

Description

User input

Users will interact with the chatbot directly by providing their name, email address, and preferences through an interface (e.g., a website or mobile app).

Data preprocessing and API interaction

User input will be validated and formatted before being sent to the chatbot’s API for processing. The chatbot will interact with a fine-tuned off-the-shelf LLM hosted on the cloud.

Retrieval-Augmented Generation (“RAG”)

For queries requiring domain-specific knowledge or up-to-date context, the system performs a retrieval step: it searches the company’s CRM, document database, or knowledge base for relevant information. The retrieved content is then combined with the user input and passed to the LLM to generate a grounded, personalized response.

Pre-Fine-Tuned LLM processing

The chatbot uses a fine-tuned LLM trained on enterprise-specific data to enhance general language understanding and tone alignment. This LLM uses the enriched input (from users and RAG) to personalize outputs.

Data storage

Pre-processed user input (e.g., preferences) will be stored locally or in the cloud to enable personalized recommendations and facilitate future interactions.

Personalized response generation

The chatbot will use stored user data from the CRM system and the fine-tuned LLM capabilities to generate tailored recommendations and responses.

Data sharing

The chatbot may share minimal (anonymized) user data with external services (e.g., third-party APIs for additional functionality or promotional tools).

Feedback collection

Users provide feedback on chatbot interactions (e.g., thumbs-up/down, comments) to improve the system’s performance. This is processed by the system for analytics purposes.

Deletion and user rights management

Users can request access to, deletion of, or updates to their personal data in compliance with the EU GDPR.

 

Mapping data flows throughout the lifecycle of the AI system (e.g., by identifying the sources of data, categories of data recipients, storage and transfer locations, data retention) is crucial in staying on top of, and mitigating, any potential privacy risks. For example, as the chatbot processes user personal data, understanding where this data may be transferred to (e.g., to a third country with no adequacy decision from the European Commission) is critical to build appropriate safeguards into the contractual arrangements with vendors, such as cloud providers.

Risk Assessment

Following a comprehensive data mapping, organizations should consider the risks associated with the deployment of the chatbot by involving their key business stakeholders with decision-making authority and direct involvement in the development, deployment and use of the AI system. These stakeholders would include representatives from various teams, such as the engineering, IT/security, privacy, and UX design teams, who will be best placed acting collaboratively to identify cross-functional risks, if any.

The Report considers three main risk factors that may affect the deployment of a chatbot:

1. Large scale processing – A significant volume of user data will be processed as a result of the users’ interactions with the chatbot.

2. Low data quality – Customer query inputs may have low quality data which could lead to inaccuracies or inefficiencies in processing.

3. Insufficient security measures – There is a potential risk of transferring personal data to countries without an adequate level of protection, especially if the LLM model is hosted or maintained in third countries.

Though a chatbot would not fall under the classification of a high-risk system under the EU AI Act, the Report recommends the undertaking of a Data Protection Impact Assessment (“DPIA”) in this working example as key to evidencing the organization’s accountability under the GDPR.

To the extent any risks have been identified, organizations should consider both the severity of the potential privacy impact on data subjects and the probability of such risks materializing, to be able to appropriately classify the relevant risk(s).

Risk Mitigation

After evaluating the identified risks, organizations should consider mitigating measures to effectively deal with the probability and severity of such risks. In the context of the chatbot working example, the Report sets out the following recommendations to mitigate each of the three risks identified above:

1. Large scale processing:

a. applying post-processing/output filters to remove or redact sensitive information from responses;

b. implementing relevance filters or scoring mechanisms to ensure only appropriate content is passed to the LLM; and

c. restricting retrieval sources to approved, privacy-screened datasets (e.g., filtered CRM data).

2. Low data quality:

a. evaluating chatbot responses regularly for accuracy and relevance;

b. training the model on high-quality, diverse datasets to reduce biases; and

c. including disclaimers in chatbot responses to clarify they are AI-generated and not definitive advice.

3. Insufficient security measures:

a. securing data transmission using adequate encryption protocols;

b. using robust API security measures, including access controls, authentication, and rate limiting;

c. encrypting stored data and implementing access controls; and

d. applying retrieval filters and output sanitization to reduce the risk of the chatbot leaking sensitive information.

Following the implementation of the mitigation measures, organizations should re-run their risk assessment with a view to obtaining an updated risk classification level and determining whether any risks are left remaining. Organizations should keep this process under review and repeat it to the extent new functionalities are added to the chatbot.

Conclusion

The Report is not just for privacy teams — it is a strategic playbook for any organization deploying generative AI, particularly where it is providing or deploying the AI system in the EU market falling under the EU AI Act. It also serves as an illustrative guide and benchmark in relation to risk assessments for organizations with a global footprint, whether or not subject to the GDPR.

The key takeaways for organizations are:

1. Risk assessments must go beyond surface-level checks and account for actual use cases. This exercise must be cross-functional to ensure a holistic assessment of all possible risks.

2. AI governance starts as early as the AI system design stage, and throughout the AI lifecycle, including procurement, implementation and updates.

3. LLM ecosystems are complex: cloud providers, API users, internal development teams, and deployers all play a role. Data mapping is key to staying on top of a complex compliance legal and regulatory framework in Europe, including the GDPR and the EU AI Act.

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.

© King & Spalding

Written by:

King & Spalding
Contact
more
less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

  • Increased visibility
  • Actionable analytics
  • Ongoing guidance

King & Spalding on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide