The rise of artificial intelligence (AI) and its widespread availability offers significant growth opportunities for businesses. However, it necessitates a robust governance framework to ensure compliance with regulatory requirements, especially under the European Union’s (EU) Artificial Intelligence Act (AI Act) (see our Guide to the AI Act) and the EU General Data Protection Regulation (GDPR). The reason GDPR compliance is so important is that (personal) data is a key pillar of AI. For AI to function effectively, it requires good quality and abundant data so that it can be trained to identify patterns and relationships. Additional personal data is often gathered during deployment and incorporated into AI to assist with individual decision-making.
In this series of five blog posts, we discuss GDPR compliance throughout the AI development life cycle and when using AI.
This is our third episode. The first and second episodes are available on the WilmerHale Privacy and Cybersecurity Law Blog.
Data Protection by Design
GDPR compliance plays a key role throughout the AI development life cycle, starting from the very first stages. This reflects one of the key requirements and guiding principles of the GDPR called data protection by design (Article 25 GDPR). Businesses are required to implement appropriate technical and organizational measures, such as pseudonymization, at both the determination stage of processing methods and during the processing itself. These measures should aim to implement data protection principles, such as data minimization, and integrate necessary safeguards into the processing to ensure GDPR compliance and protect individuals’ data protection rights.
AI Development Life Cycle
The AI development life cycle encompasses four distinct phases: planning, design, development, and deployment. In this context, in accordance with the terminology of the EU AI Act, we will refer to both AI models and AI systems.
- AI models are a component of an AI system and are the engines that drive the functionality of AI systems. AI models require the addition of further components, such as a user interface, to become AI systems.
- AI systems present two characteristics: (1) they operate with varying levels of autonomy and (2) they infer from the input they receive how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments.
In this blog post, we focus on the third phase of the AI development life cycle: development. We already discussed the first and second phases (planning and design) in our previous blog posts (see here and here).
The Development Phase
The third phase of the AI development life cycle involves building the AI model, defining its features, and transforming data into a useful representation to improve the model’s performance and boost its explainability. AI training then enables algorithms to learn from the prepared dataset. This is when the model develops and enhances its capacity to make predictions by learning patterns from the data. Validation and testing further ensure model performance and generalization.
In this context, particular attention must be paid to individuals’ GDPR rights and data security, which are key aspects of GDPR compliance and highly relevant for the AI development process.
Right to Information
The GDPR requires that individuals be given specific information so that they can exercise their GDPR rights. The information to be shared varies depending on whether the personal data was collected directly from the individual concerned (Article 13 GDPR) or from another source (Article 14 GDPR).
- If the personal data was collected directly from the individual concerned, the information must be provided when it is obtained.
- If the personal data was collected from another source, the information must be shared within a reasonable period after obtaining the data, but at least within one month, having regard to the specific circumstances of the processing. More specific rules apply to intended disclosure to another recipient or communication with the individual concerned.
Any information shared must always be provided in a concise, transparent, intelligible, and easily accessible form, using clear and plain language.
Generally, individuals must know what personal data is processed, why, for how long, with whom it is shared, and their GDPR rights, including but not limited to, the rights of access, rectification and erasure, and restriction and object. AI models (and systems) must be built in such a way that they can adapt if individuals exercise these rights.
Right of Access
The GDPR gives individuals the right to obtain confirmation as to whether their personal data is processed and, when this is the case, access to such data and the information listed in Article 15 GDPR, including but not limited to the purposes of the processing, the categories of personal data concerned, the data recipients, and data transfers.
The right to obtain a copy of personal data can be very challenging as it involves ensuring that the rights and freedoms of others are not affected. Granting access therefore requires appropriate safeguards, including a fair amount of anonymization.
Rights to Rectification and Erasure
Individuals have the right to obtain the rectification of inaccurate personal data concerning them. If AI includes inaccurate personal data, the individual concerned may ask that it be rectified or completed.
Accuracy has a different meaning in the context of AI development and in that of the GDPR. Although it serves different purposes in these contexts, accuracy in the GDPR and AI development impact each other.
- Accuracy in AI. In AI development, accuracy refers to the performance of an AI model in correctly predicting or classifying data. It is a measure of how well the model’s outputs match the true values or labels in the dataset. High accuracy indicates that the AI model is reliable and effective in its tasks, such as image recognition, natural language processing, or predictive analytics.
- Accuracy in the GDPR requires that personal data is accurate, kept up to date, and that every reasonable step is taken to ensure that inaccurate personal data is erased or rectified without delay. The GDPR’s focus on accuracy is aimed at protecting individuals’ rights.
- Relationship. All personal data, whether it is an output of an AI system or information about an individual as an input, is subject to the accuracy principle. The accuracy of the output depends on the accuracy of the input. Therefore, when it comes to personal data, the model’s or system’s performance is inherently linked to the GDPR accuracy principle.
- Fairness. A separate but equally important issue is whether AI generates harmful content due to the information it ingests. For example, AI trained on data reflecting gender inequalities can generate results that discriminate against individuals based on their gender. According to the European Data Protection Board (EDPB, the umbrella group of the EU’s data protection authorities), the GDPR fairness principle requires that personal data should not be processed in a way that is unjustifiably detrimental, unlawfully discriminatory, unexpected, or misleading to the individual concerned. To address the risks of bias and discrimination, it is possible to alter the learning procedure, alter the data by adding or removing data concerning underrepresented or overrepresented demographic groupings to balance the training data, or alter the model after it has been trained.
In limited circumstances, individuals have the right to obtain the erasure of their personal data. This typically applies when the data in question has been processed unlawfully or is no longer necessary for the purposes of the processing. Data erasure can disrupt the training and performance of AI models that rely on large datasets. Removing data can lead to gaps, reduce the model’s accuracy, and necessitate retraining with updated datasets.
Rights of Restriction and to Object
Individuals have the right to obtain the restriction of processing while they review the accuracy of their personal data, the processing is unlawful or no longer necessary, or the individuals concerned have objected to the processing of their personal data based on legitimate interests. The right of restriction and the right to object may have an enormous impact on the processing of personal data for building AI models.
- If the processing has been restricted, except for storage purposes, the data in question may only be processed with the individual’s consent or for limited purposes, such as the establishment, exercise, or defense of legal claims, and the protection of other persons’ rights.
- If the processing has been objected to, the personal data can no longer be processed unless compelling legitimate grounds for the processing override the interests, rights, and freedoms of the individual concerned, or for the establishment, exercise, or defense of legal claims.
Security
The GDPR requires implementing appropriate technical and organizational measures to ensure a level of security appropriate to the risk, especially regarding unlawful destruction, loss, alteration, and unauthorized disclosure or access to personal data.
It is essential to address the risks posed by potential threats that could result in the exposure of personal data processed during the AI training phase. Typical risks include model inversion, model inference, and attribute inference.
- Model inversion attacks involve using the output of an AI model to infer the input.
- Membership inference attacks consist of determining whether a specific data point (also called a target sample) was part of the training dataset.
- Attribute inference attacks involve attempting to extract information about the sample from the target model. This assumes that the attacker has partial knowledge of a sample in the training set.
These threats can be mitigated using privacy-enhancing technologies, such as the following:
- Differential privacy works by adding random noise to the data, preventing attackers from identifying individuals while allowing useful insight to be drawn from the dataset.
- Federated learning allows different parties to train AI models on their own information. They then combine identified patterns into a global model without having to share any training information with each other. This helps minimize the risk arising from data breaches, as no personal data is held together in a central location.
- Synthetic data is artificial data generated by data synthesis algorithms to reduce the amount of personal data processed (see episode 2).
- Homomorphic encryption allows computations to be performed on encrypted information without first decrypting it. This helps minimize the risk from data breaches because personal data remains encrypted at rest, in transit, and during computation.
- Secure multiparty computation (SMPC) allows different parties to jointly process their combined information without any party needing to share all of its information. SMPC helps minimize the risk from personal data breaches since the shared information is not stored together.
The authors would like to thank Ekaterina Fakirova for her assistance in preparing this blog post.