How Big AI Developers are Skirting a Mandate for Training Data Transparency
Summary
The EU AI Act includes a crucial, yet often neglected, provision mandating developers of "general-purpose AI" models to publish a summary of their training data, a measure vital for copyright holders, privacy watchdogs, and researchers. New peer-reviewed research, supported by Mozilla, developed a framework to assess these summaries and found that open-source developers like Hugging Face (SmolLM) and Swiss AI (Apertus) are successfully meeting or exceeding transparency standards, demonstrating the feasibility of compliance. However, the most concerning finding is that leading AI developers, including OpenAI, Google, and xAI, have failed to publish any such summaries, exploiting a legal gray area due to delayed enforcement powers. This non-compliance by industry behemoths underscores the urgent need for the EU AI Office to prepare for rigorous enforcement to ensure transparency and prevent smaller, compliant developers from being disadvantaged.
Key takeaway
Major AI developers like OpenAI and Google are currently failing to comply with the EU AI Act's mandate for publishing training data summaries, despite open-source models demonstrating feasibility. New research shows smaller projects like Swiss AI's Apertus achieve high transparency scores using an assessment framework, while leading developers exploit an enforcement gap. This non-compliance hinders critical oversight for copyright, privacy, and research, demanding urgent enforcement from the EU AI Office for accountability and fair competition.
Topics
- AI Training Data Transparency
- EU AI Act
- General-Purpose AI Models
- Regulatory Compliance
- Open-Source AI
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, AI Engineer, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Tech Policy Press.