The European Data Protection Board Shares Opinion on How to Use AI in Compliance with GDPR

The European Data Protection Board's (EDPB) Opinion 28/2024 provides valuable insights into the intersection of artificial intelligence and data protection, particularly in the context of compliance with the EU General Data Protection Regulation (GDPR). The opinion addresses several questions regarding the processing of personal data during the development and deployment of AI models, emphasizing the importance of lawful, fair and transparent data handling practices. While the EDPB's opinion is not a mandatory interpretation of the GDPR, companies will find guidance on how to address some of the issues that arise when using AI in compliance with GDPR.

The EDPB opinion does not focus on the territorial question of when GDPR applies. If in doubt, companies should first assess whether their use of AI generally falls within the scope of the GDPR as per Article 3 GDPR.

I. Do AI Models Fall Under Data Protection Laws?

In the view of the EDPB, the GDPR regularly applies to AI models trained with personal data.

Different types of AI models are distinguished as follows:

Model Type

Description

GDPR Applicability

Explicit Data-Providing Models

These models are specifically designed to provide or make available personal data about individuals whose data was used for training. Examples include:

Generative models fine-tuned with the personal data of an individual, such as voice recordings to mimic the individual's voice.
Models that reply to prompts with personal data, e.g., "Who is the president of Poland?"

Always applicable.

The EDPB clearly states that these models inherently process personal data, so they cannot be considered anonymous.

Implicit Data-Embedding Models

These AI models are not intentionally designed to produce identifiable personal information. However, personal data from the training set can still be embedded in the model's parameters, namely represented through mathematical objects. Parameters in a model may reflect statistical relationships from the training data. This means it's possible to extract personal data—either accurately or inaccurately—by analyzing these relationships or querying the model.

Information about training data might be extractable through targeted prompts.

For instance, by exploiting vulnerabilities in AI models, one can perform a membership inference attack to determine whether specific data was included in the training set.
Additionally, model inversion attacks can be used to reconstruct parts of the original training data.
At least, personal data can be obtained accidentally through interactions with an AI model, for example as part of an AI system.

Subject to case-by-case analysis.

AI models trained on personal data can't always be considered anonymous. Instead, anonymity should be assessed on a case-by-case basis using specific criteria.

II. When Can an Implicit Data-Embedding AI Model be Deemed "Anonymous"?

These AI models typically don't contain directly isolatable or linked data sets, but rather parameters that indicate probabilistic relationships between the data embedded in the model.

To deem such an AI model anonymous, supervisory authorities (SAs) must have adequate evidence that:

Personal data from the training set cannot be extracted with reasonable means, and
The model's output does not pertain to the individuals whose data was used for training.

To determine if an AI model meets anonymity conditions, SAs should consider three elements:

Evaluation of identification risks.

All means reasonably likely to be used to identify individuals.

Risk of identification.

The EDPB states that data is considered anonymous if it cannot be singled out, linked or inferred based on past opinions and guidelines. Due to the potential for data extraction and inference, the EDPB recommends a detailed risk assessment for AI models.

This assessment should be made considering all means reasonably likely to be used to identify individuals. It should be based on objective factors outlined in Recital 26 GDPR:

The characteristics of the training data, AI model, and training procedure.

The context in which the AI model is released or processed.
Additional information that could enable identification and is accessible to the given person.
The costs and time required to obtain such additional information, if not already available.
The available technology at the time of processing and any technological developments.

Third, SAs should check if controllers have evaluated the residual risk of identification by themselves and others, including unintended third parties. They should also assess whether these parties could reasonably access or process the data.

The EDPB lists some elements that SAs may consider when evaluating a controller's claim of anonymity for an AI model:

AI model design	Selection of sources. Data preparation and minimization. Methodological choices for training. Measures regarding model outputs.
AI model analysis	Appropriateness of measures chosen to reduce the likelihood of re-identification. Document-based audits. Code review reports.
AI model testing and resistance to attacks	Scope, frequency, quantity and quality of tests conducted. Testing against widely known, state-of-the-art attacks.
Documentation	Legally required documentation of the processing operations. Regular risk assessments.

The EDPB takes the view that if a supervisory authority cannot confirm effective anonymization measures from the documentation, it may conclude that the controller has not fulfilled its accountability obligations under Article 5(2) GDPR.

What should the controller's documentation ideally contain?

Information on Data Protection Impact Assessments (DPIAs), including assessments and decisions on the necessity of a DPIA.
Advice or feedback from the Data Protection Officer (if appointed).
Technical and organizational measures taken during AI model design to reduce identification risks, including threat model and risk assessments, with details for each training data source.
Measures taken throughout the model's lifecycle that contributed to or verified the absence of personal data.
Documentation of the model's theoretical resistance to re-identification techniques and controls for limiting or assessing attack impacts, including data-to-parameter ratios, re-identification likelihood metrics, testing reports and results of the tests.
Documentation provided to deploying controllers and data subjects about measures to reduce identification likelihood and any residual risks.

III. What general principles guide SAs in assessing GDPR compliance for AI models?

The EDPB emphasizes that the GDPR does not prioritize any legal basis for data processing, as required by Article 6(1) GDPR. Key principles from Article 5 GDPR should guide the assessment of AI models:

Accountability (Article 5(2) GDPR)	Lawfulness, Fairness, and Transparency (Article 5(1)(a) GDPR)	Purpose Limitation and Data Minimization (Article 5(1)(b), (c) GDPR)	Data Subject Rights (Chapter III GDPR)
Controllers must demonstrate GDPR compliance, defining roles and responsibilities before processing begins. See question two on which documentation the EDPB considers particularly important.	Processing must be lawful, fair and transparent, with clear information provided to data subjects. Additional information is required in automated decision-making contexts.	Personal data should be adequate, relevant and necessary for the specific purpose, with clear identification of processing activities and purposes.	All rights must be respected. This includes, in particular, the right to object when legitimate interest is used as a legal basis.

IV. (When) Can Controllers Rely on the Legal Basis of Legitimate Interest in the Context of Development and Deployment of AI Models?

A practicable alternative to obtaining consent is to rely on legitimate interest under Article 6(1)(f) GDPR. According to the EDPB, this requires the controller to conduct a thorough three-step assessment. The EDPB refers to 2024 guidance in this regard and, in essence, repeats preexisting conditions for legitimate interest, providing some examples.

Step

Description of assessment in this step

Examples related to AI

Identify legitimate interests

An interest can be considered legitimate if it is:

Lawful.
Clearly articulated.
Real and present.
Not speculative.

Developing a conversational agent to assist users.
Creating an AI system to detect fraudulent content or behavior.
Enhancing threat detection in an information system.

Analysis of the necessity of the processing

The processing must be necessary to pursue the identified interest. The necessity test includes two elements:

Whether the processing activity effectively pursues the purpose.
Whether there is no less intrusive way to achieve the same goal.

Regarding AI models, that means:

Evaluating if the volume of personal data processed is proportionate and if alternatives exist that do not involve personal data processing.

Balancing test

The legitimate interests of the data controller must outweigh any limitation to rights and freedoms caused by the processing of the data subjects' personal data.

The impact depends on the nature of the data processed by the models, the processing context and the potential consequences. SAs should evaluate these impacts and the likelihood of further consequences on a case-by-case basis, considering both the development and deployment phases.

Also, the reasonable expectations of the data subjects play an important role in the legitimate interest assessment. SAs should take into account the following considerations:

The impact of the processing on data subjects, e.g.:
- Whether special categories of data, such as financial or location data that can have significant effects, are affected.
- The status of the data subject and their relationship with the controller.
- Data volume.
- Development methods and security measures.
The reasonable expectations of the data subjects regarding the development phase, e.g.:
- The context and source, including whether the training data was publicly available.
- Whether data subjects are actually aware their data is online.
- The potential further uses of the model.
The reasonable expectations of the data subjects regarding the deployment phase, e.g.:

Whether data subjects are aware that they had provided personal data so that the AI model could adjust its responses to their needs.
Whether the processing affects only the individual user's service or is used to modify the service for all customers.

Examples of interests, fundamental rights and freedoms of the data subjects in the AI model development phase:

Self-determination.
Retaining control over personal data collected for development.

In the AI model deployment phase:

Maintaining control over personal data processed after deployment.
Financial interests (e.g., when the AI model is used in a professional context).
Socioeconomic interests (e.g., when the AI model facilitates access to healthcare or education).
Other personal benefits from its use.

Examples of risks that may arise during development:

Collecting personal data without consent or knowledge (Scraping).
Large-scale data collection may create a surveillance atmosphere, leading to self-censorship and threatening freedom of expression.

Examples for risks that may arise during deployment:
Processing data in ways that violate individuals' rights.
Inferring personal data through attacks such as membership inference or model inversion.
Blocking content with AI models, which can affect freedom of expression.
Recommending inappropriate content to vulnerable individuals, impacting mental health.
Discrimination or infringement on the right to engage in work when AI models are used for job pre-selection.
Threats to security and safety if AI models are used with malicious intent.

However, AI models can also positively impact rights. For example:

Supporting mental well-being.
Facilitating access to essential services.
Enhancing access to information and education.

V. Which Mitigating Measures Can Controllers Use That Impact the Balancing Exercise?

Certain measures can limit the impact of processing on data subjects, thereby allowing the controller to rely on legitimate interest. Mitigating measures should be tailored to the AI model's specific circumstances and intended use. The EDPB emphasizes that mere compliance with legally required measures is not sufficient for this purpose. Where measures are not legally required or exceed the required scope, however, they can be considered for the balancing exercise. The EDPB lists a few examples of such measures:

	Development of AI models	In the context of web scraping	Deployment of AI models
Technical measures	Such as: Pseudonymization. Data masking. Substitution with fake personal data in the training set.	Such as: Excluding data that poses risks to certain individuals or groups. Ensuring sensitive data categories are not collected. Setting criteria based on time periods or other relevant factors to limit data collection.	Such as: Implementing safeguards to prevent the storage, regurgitation or generation of personal data, particularly in generative AI models. Measures to mitigate the risk of unlawful reuse by general purpose AI models, such as watermarking the outputs.
Measures to facilitate the exercise of individuals' rights	Such as: Implementing a reasonable delay between collecting a training dataset and its use, allowing data subjects time to exercise their rights.	Such as: Creating opt-out lists.	Such as: The right to erasure of personal data from model outputs. Deduplication. Post-training techniques to remove personal data.
Transparency measures	Such as: Providing accessible information that exceeds Article 13 or 14 GDPR requirements. Utilizing various methods to inform data subjects.	No examples provided.	Such as: Providing data subjects with information from the balancing test in advance of any collection of personal data.

VI. If an AI model is found to have been initially trained in violation of data protection laws, how does this impact its future use?

In the final part of its opinion, the EDPB examines how unlawful data processing during the development phase impacts the subsequent use of an AI model. It first reminds SAs of their responsibility to verify the lawfulness of processing during the initial development phase. It also lists the corrective powers of supervisory authorities, including fines, temporary limitation on the processing, ordering erasure of parts of the dataset that are unlawfully processed or, in some cases, ordering the erasure of the whole dataset or the AI model itself. The EDPB then goes on to outline three scenarios in which unlawful processing during development may impact the later deployment of an AI model:

Description

Same Controller

New Controller

Subsequent Anonymization

A controller unlawfully processes personal data to develop an AI model, retains this data in the model, and then uses it in subsequent processing during deployment.

A controller unlawfully processes personal data to develop an AI model, retains the data within the model, and then another controller processes this data during deployment.

A controller unlawfully processes personal data to develop an AI model but then anonymizes the model before any further processing occurs. The subsequent processing, initiated by the same or another controller, involves new personal data during deployment.

How does such initial unlawful processing affect later use?

Each case requires a specific analysis to determine whether the development and deployment phases serve separate purposes and how the lack of a legal basis for initial processing affects later stages.

Example:

When subsequent processing is based on legitimate interest under Article6(1)(f)of the GDPR, the initial unlawful processing will be considered in the assessment. This includes evaluating risks to data subjects and their expectations regarding further processing. The unlawfulness of the initial processing can affect the legitimacy of subsequent processing activities.

The initial unlawful processing by the first controller can affect the subsequent processing by the second controller, potentially impacting the lawfulness of the deployment phase.

Unlike Scenario 1, the subsequent processing is handled by a different controller. The EDPB emphasizes the importance of identifying the roles and responsibilities of each party under the GDPR. SAs should assess the legality of both the original developer's and the acquiring controller's processing.

Since this scenario is of broad practical relevance, we have examined it in a separate article. You can find it here.

The focus here is on ensuring that the initial data is fully anonymized, so it no longer impacts the legality of the subsequent processing activities.

If the model is genuinely anonymized, the GDPR ceases to apply to the initial unlawfully processed data. The unlawfulness does then not affect later use. However, SAs must verify claims of anonymity. If new personal data is processed during deployment after anonymization, the GDPR applies to those activities, but the initial unlawful processing does not impact the legality of this subsequent processing.

I. Do AI Models Fall Under Data Protection Laws?

II. When Can an Implicit Data-Embedding AI Model be Deemed "Anonymous"?

III. What general principles guide SAs in assessing GDPR compliance for AI models?

IV. (When) Can Controllers Rely on the Legal Basis of Legitimate Interest in the Context of Development and Deployment of AI Models?

V. Which Mitigating Measures Can Controllers Use That Impact the Balancing Exercise?

VI. If an AI model is found to have been initially trained in violation of data protection laws, how does this impact its future use?

Written by:

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Published In:

Orrick, Herrington & Sutcliffe LLP on:

The European Data Protection Board Shares Opinion on How to Use AI in Compliance with GDPR

I. Do AI Models Fall Under Data Protection Laws?

II. When Can an Implicit Data-Embedding AI Model be Deemed "Anonymous"?

III. What general principles guide SAs in assessing GDPR compliance for AI models?

IV. (When) Can Controllers Rely on the Legal Basis of Legitimate Interest in the Context of Development and Deployment of AI Models?

V. Which Mitigating Measures Can Controllers Use That Impact the Balancing Exercise?

VI. If an AI model is found to have been initially trained in violation of data protection laws, how does this impact its future use?

Latest Posts

Written by:

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Published In:

Orrick, Herrington & Sutcliffe LLP on:

"My best business intelligence, in one easy email…"