by David Dumont and Tiago Sérgio Cabral
On June 7, 2024, following a public consultation, the French Data Protection Authority (the “CNIL”) published the final version of its guidelines addressing the development of AI systems from a data protection perspective (the “Guidelines”). Read our blog on the pre-public consultation version of these Guidelines.
In the Guidelines, the CNIL states that, in its view, the successful development of AI systems can be reconciled with the challenges of protecting privacy.
The Guidelines are divided into seven “AI how-to sheets” in which the CNIL seeks to guide organizations through the necessary steps to take in order to develop AI systems in a manner compatible with the GDPR. The “AI how-to sheets” provide guidance on: (1) determining the applicable legal regime (e.g., the GDPR or the Law Enforcement Directive); (2) defining a purpose; (3) determining the legal qualification of AI system providers (e.g., controller, processor or joint controller); (4) ensuring the lawfulness of the data processing; (5) carrying out a data protection impact assessment (“DPIA”) when necessary; (6) taking into account data protection when designing the AI system; and (7) taking into account data protection in data collection and management.
Noteworthy takeaways from the Guidelines include:
- In line with the GDPR, the purpose for processing personal data in the development of an AI system must be specific, explicit and legitimate. Additionally, the data must not be further processed in a manner incompatible with this initial purpose as per the principle of purpose limitation. The CNIL clarifies that where an AI system is developed for a single operational use, the purpose for processing personal data in the development phase is directly related to the purpose of processing in the deployment phase. Therefore, if the purpose in the deployment phase is specified, explicit and legitimate, the purpose in the development phase will also be determined. However for certain AI systems, such as general purpose AI systems, the operational use may not be clearly identifiable in the development phase. In this case, to be deemed sufficiently precise regarding the purpose of the processing, the data subject must be provided information on the type of system developed (g., large language model) in a clear and intelligible way, and the controller should assess the technically feasible functionalities and capabilities of the AI system (e.g., the controller must draw up a list of capabilities that it can reasonably foresee at the development stage).
- The role of the parties involved in processing operations related to AI systems should be assessed on a case-by-case basis. However, the CNIL draws attention to certain elements that should be considered when carrying out this analysis. For example, a provider that is at the initial development of an AI system and creates the training dataset based on data it has selected on its own account should generally be considered as a controller.
- Consent, legitimate interests, performance of a contract and public interest may all theoretically serve as legal bases for the development of AI systems. Legal obligation could also serve as a legal basis for the deployment of AI systems, but the CNIL considers it difficult to rely on this basis for development in most cases. Controllers must carefully assess the most adequate legal basis for their specific case.
- DPIAs carried out to address the processing of data for the development of AI systems must address specific AI risks, such as the risk of automated discrimination caused by the AI system, the risks related to the confidentiality of the data that could be extracted from the AI system, the risk of producing fictitious content about a real person, or the risks associated with known attacks specific to AI systems (g., attacks by data poisoning, insertion of a backdoor or model inversion).
- The association of an ethics committee with the development of AI systems is a good practice to ensure that ethical issues and the protection of human rights and freedoms are taken into account upstream.
- Data minimization and data protection measures that have been implemented during data collection may become obsolete over time and must be continuously monitored and updated when required.
- Re-using datasets, particularly those publicly available on the Internet, is possible to train AI Systems, provided that the data was lawfully collected and that the purpose of re-use is compatible with the original collection purpose.
In the coming months, the CNIL will supplement these Guidelines with further how-to sheets including with regards the legal basis of legitimate interest, the management of rights, the information of data subjects, and annotation and security during the development phase.
Read the Guidelines and the press release.
David Dumont is a Partner and Tiago Sérgio Cabral is an Associate at Hunton Andrews Kurth LLP. This post first appeared on the firm’s blog.
The views, opinions and positions expressed within all posts are those of the author(s) alone and do not represent those of the Program on Corporate Compliance and Enforcement (PCCE) or of the New York University School of Law. PCCE makes no representations as to the accuracy, completeness and validity or any statements made on this site and will not be liable any errors, omissions or representations. The copyright of this content belongs to the author(s) and any liability with regards to infringement of intellectual property rights remains with the author(s).