On July 24, 2025, the European Commission’s AI Office published its template for the public summary of training content for general-purpose AI (GPAI) models under Regulation (EU) 2024/1689 (AI Act). The publication of the template complements the release of the GPAI Code of Practice, a voluntary compliance mechanism, and the guidelines for providers of GPAI models (see our blog posts on the Code of Practice and the guidelines). The template is intended to help providers fulfill their disclosure obligations under Article 53(1)(d) of the AI Act and to support the enforcement of copyright and data protection law.
This blog post summarizes the core features of the template and outlines compliance expectations for AI developers. The template is a mandatory compliance tool that all GPAI providers must use to publicly disclose high-level information about the data used to train their models. Its release marks a pivotal milestone in the EU’s AI governance framework, as it transitions from voluntary norms to legally binding obligations.
Background
- AI Act. As explained in a previous blog post, the AI Act introduces specific obligations for providers of GPAI models, which are capable of performing a wide range of tasks and integrating into various systems.
- GPAI Provider Obligations. Providers of GPAI models must maintain detailed technical documentation, publish summaries of training data, comply with EU copyright law, and share information with regulators and downstream users. Specifically, Article 53(1)(d) of the AI Act requires GPAI providers to publish a “sufficiently detailed summary” of the content used for the training of the model, according to a template provided by the AI Office, including data protected by copyright law.
- Models with Systemic Risk. Providers offering GPAI models with systemic risk face stricter requirements, including model evaluations, risk mitigation, incident reporting, and cybersecurity measures.
- GPAI Template. The new template requires GPAI providers to disclose structured information about the data used to train their models, including the types of content, data sources, and methods of collection. It plays a central role in enforcing transparency obligations, particularly around copyright compliance and the use of web-scraped and user-generated data.
Scope and Applicability
- All GPAI Providers. The requirement to publish a summary of the training data applies to all GPAI models, including those released under open-source licenses.
- Modified Models. Entities that modify an existing GPAI model significantly, such as through additional training or fine-tuning, must use the template to report information solely about the training content used for those modifications.
- Model Life Cycle Updates. Where additional training occurs post-market, providers must update their summaries every six months, or sooner if the update constitutes a material change.
Structure of the European Commission’s Template
The European Commission’s template standardizes the summary of the training data. It contains three major sections, which require narrative responses. GPAI providers are also encouraged to include additional voluntary information.
- General Information. The first section requires GPAI providers to present a high-level overview of the model, including its intended purpose and general context. This section requires information about:
- Model and provider identification, including version numbers and publication dates;
- Modalities covered (e.g., text, images, video, audio);
- Training data size estimates; and
- Language and demographic coverage.
- List of Data Sources. The second section requires GPAI providers to identify the types and origins of data sources used to train the model. Providers must:
- Identify large training datasets individually, while smaller datasets may be described in aggregate.
- Indicate whether they obtained commercially licensed content via licensing agreements with rights holders or their representatives.
- Specify whether their training data includes web-scraped content, including a list of the top 10% of domain names used (or the top 5% or 1,000 domains for small and medium-sized enterprises), as well as details about how their web crawlers operated and when the data was collected.
- Report if they used user-generated data from interactions with the model or other services such as social media or email.
- Disclose whether synthetic data generated by other AI models was used and, if so, identify the source models and describe their training origins.
- Data Processing Aspects. The final section requires GPAI providers to explain the methods and steps taken to process the data before it was used in training. GPAI providers must describe their copyright compliance by outlining how they respected opt-outs under the EU Copyright Directive’s text and data mining exception. They must also offer a narrative description of content moderation measures used to detect and remove illegal content.
Enforcement and Next Steps
- The requirement to publish a training data summary took effect on August 2, 2025. GPAI models placed on the market before that date have until August 2, 2027, to publish the summary.
- Supervision and Enforcement. Starting August 2, 2026, the AI Office may verify compliance and issue corrective measures. The European Commission will not perform content-level audits but can act upon complaints or “qualified alerts” issued by the “scientific panel,” an advisory body composed of independent AI experts (Article 90(2) of the AI Act).
- Noncompliance can result in fines of up to €15 million or 3% of global annual revenue, whichever is greater (Article 101 of the AI Act).
- Remaining Questions. While the template provides concrete requirements for GPAI providers, it also leaves certain ambiguities unresolved. For example, it is unclear whether the “size of the content scraped” should be measured by file size, token count, or another metric. Additionally, the scope and thresholds for “post-market training” that trigger update obligations remain undefined.
The authors would like to thank Jess Miller for her assistance in preparing this blog post.