On 19 June 2025, CNIL published two additional “how-to-sheets” on artificial intelligence, one on legitimate interest and the other on the collection of data via web scraping. These documents aim to clarify the rules applicable to the creation of training datasets containing personal data.
On June 19, 2025, CNIL published two additional “how-to-sheets” on artificial intelligence. The first one sets out the conditions under which the legal basis of legitimate interest may be used for the development of an AI system, while the second focuses specifically on the collection of data via web scraping (see here our post).
In its first "how-to-sheet,” CNIL explains the requirements to satisfy to be able to rely on legitimate interest as legal basis for processing personal data during the development phase of an AI system.
Requirement 1: The interest pursued must be legitimate
CNIL recalls that an interest is presumed legitimate when it is: (i) clearly lawful under applicable law, (ii) sufficiently specific and well-defined, and (iii) real and present.
When the future use of the model is not yet known at the development stage, CNIL recommends referring to the objective of the model's development.
CNIL also notes that relying on legitimate interest does not eliminate the obligation to obtain consent where required by other legislation (e.g. Article 5.2 of the DMA on cross-use of personal data).
Requirement 2: The processing must be necessary
Processing is considered necessary if:
- It enables achievement of the pursued interest;
- There are no less privacy-intrusive means to reach the same goal; and
- It complies with the principle of data minimization. The controller must ensure that the processing or retention of personal data is necessary, including evaluating whether the data can be retained in a form that permits identification. CNIL encourages the use of technologies that allow model training with less reliance on personal data.
Requirement 3: The AI system's purpose must not unduly harm individuals' rights and freedoms
•Assessing positive and negative impacts
To ensure that legitimate interest does not result in a disproportionate impact on individuals' rights and freedoms, the controller must assess both the benefits of processing and its potential adverse effects. The greater the anticipated benefits, the more likely the legitimate interest may outweigh the risks to individuals.
The controller must therefore identify actual or potential consequences for data subjects resulting from both the development and use of the AI system.
CNIL provides a list of criteria to guide this balancing test, which can also be used as part of a Data Protection Impact Assessment (DPIA).
CNIL distinguishes between risks that arise during the development phase and those related to the deployment of the AI system, both of which must be considered during the development phase due to their systemic nature:
- Assessing reasonable expectations
Where processing relies on legitimate interest, the controller must assess whether data subjects can reasonably expect the processing, both in its methods and its consequences.
CNIL identifies criteria to evaluate these expectations, based on the source of data collection:
- Implementing additional safeguards
In order to limit the impact of processing on data subjects and to ensure a balance between the rights and interests at stake, CNIL recommends the implementation of additional technical, organizational, or legal safeguards aimed at reducing risks to the data subject's rights and freedoms. These safeguards are in addition to the existing obligations under the GDPR, which remain mandatory regardless of the legal basis. Additional safeguards must be proportionate to the risks identified at each stage of development.
[View source.]