This article is based on a presentation at Womble Bond Dickinson’s AI Intensive: Playbook for Innovation and Risk Mitigation virtual summit on May 20, 2025 (including some updates in respect of the UK due to legislative developments). The panel included Womble Bond Dickinson (US) Partner Chris Mammen, Womble Bond Dickinson (UK) Managing Associate Andrew Jerrard and Skybound Entertainment Senior Legal Counsel, Games Sam Lam.
Text and data mining (TDM) is a hot topic, both in corporate IP circles as well as in the general news.
What is text and data mining? The UK government has described it as, “using computational techniques to analyze large amounts of information to identify patterns, trends and other useful information.”
TDM is seen as a way to unlock the amazing potential of artificial intelligence. But it also offers a wide range of legal, business, regulatory, and ethical challenges. “It’s not a new concept,” Mammen said. But TDM is critical to training AI models, and that is changing how TDM is considered from both intellectual property and commercial perspectives.
Copyright law protects against unauthorized copying of copyrighted material unless an exception applies. So-called TDM exemptions allow other users to make copies of copyrighted material for the purpose of automated analysis (TDM) without needing to obtain permission from the copyright holder. Historically, the purpose of TDM exemptions was primarily for non-commercial research. But there has been a recent discussion about extending TDM exemptions for commercial AI applications, as opposed to pure research use. Such a shift would have significant implications for IP law and AI innovation.
“This is something we should care about if we are creatives, employ LLMs, or use data,” Lam said.
TDM Legal Standards in the US, UK, and Around the World
The U.S. is one of the few jurisdictions that doesn’t have TDM exemptions enshrined in statutory law. Instead, Lam said the “Fair Use” provisions under the Copyright Act serve as the U.S. analogue to TDM exemptions. The four factors judges consider under Fair Use are:
- The purpose and character of the use;
- The nature of the copyrighted work;
- The amount and substantiality of the portion taken; and
- The effect of the use upon the potential market for or value of the copyrighted work.
“There is a pretty robust jurisprudence under Fair Use that effectively establishes a TDM exemption in the U.S., but it isn’t in the copyright statute,” Lam said.
The UK has a non-commercial research TDM exemption established by law (where access to the work is lawful). But Jerrard said this may be expanded both due to technological and political factors. The previous UK government proposed a blanket TDM exemption for AI training. “But the pushback from creatives was so strong that the government dropped that proposal,” he said.
After an unsuccessful attempt by a cross-industry group to put forward an approach, an open consultation set out the UK government's preferred approach was to favor TDM exemptions for AI training (even for commercial use) but giving rights holders the option to opt out. Jerrard said the government’s goal is to create a regulatory environment conducive to attracting international AI training to the UK.
But the matter may not yet be settled. “The government is now considering over 11,500 responses to that consultation,” Jerrard said. Such feedback may change the government’s preferred approach.
The Data (Use and Access) Act 2025 was passed in the UK on June 19, 2025 after much parliamentary ping-pong as the House of Lords wanted to give more protection to rights holders (including extra-territorial application of UK copyright laws in the context of web-scraping and AI system training) and the House of Commons wanted to delay the issue to approach AI copyright issues in a more holistic manner. That Act was finally passed with a requirement on the UK government to publish two reports within nine months: an assessment of the economic impact in the UK on (1) each of the four policy options considered in the consultation mentioned above (including on copyright owners and AI system developers and users) and (2) the use of copyrighted works in the development of AI systems (and each report with certain items to be considered).
Since 7 June 2021, the European Union has had TDM exemptions under the Digital Single Market Directive, so long as access to the work is lawful, for (1) scientific research by research and cultural institutions, and (2) anyone, unless expressly reserved by the rights holder. This was effectively confirmed under the EU AI Act, which passed on 13 June 2024.
Lam said Japan’s approach is somewhere between the UK and U.S. approaches. Japan’s legislature amended that country’s copyright act to include an exception, which states: “It is permissible to exploit any work in any case in which it is not a person’s purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work.” So, while there is a statutory exemption, its definition is murky.
Part of the copyright challenge around TDM and machine learning is that the original work is not necessarily reintroduced downstream. Machine-learning models typically output something trained on, but not identical to, the original work, Lam said.
Mammen said a group of creative industries in South Korea is asking the government to protect creators’ rights in AI usage.
TDM Exemption Macro Themes
While the specifics may vary from nation to nation, Jerrard said, “Countries around the world are grappling with the same issues.”
This includes perceptions and misperceptions about what other countries are up to and why they are doing things. For example, many copyright owners mistakenly believe the U.S. has a formal TDM exemption. Further, many of the laws being applied were written before the advent of generative AI.
One main reason for the confusion is that TDM exemption policies are in flux everywhere. AI is one area where the evolution of technology is several steps ahead of the law—and continues to evolve faster than the law can catch up.
Another big-picture issue is the shortcomings associated with shoehorning AI into existing legal frameworks.
“I understand the rush to look to copyright law to solve these challenges,” Lam said. But these may be insufficient because these laws address copying and reproduction of copies. Does AI copy or reproduce existing work? He said he’s paying close attention to U.S. class actions in multiple circuits. Different jurisdictions may reach different conclusions.
“One of the questions is whether AI ingesting these data sets is [prohibited] copying at all,” Mammen said. Some advocates argue that ingesting copyrighted material as AI training data merely involves making intermediate, transitory copies, and is not, as such, prohibited under the copyright laws.
Implications for Stakeholders
For in-house counsel, chain of title and due diligence on third-party data are important considerations. However, TDM creates a situation where companies don’t know where their data is coming from. Lam said this could be good news in an IP dispute, as the defendant company would not be able to produce this source information in discovery. However, it also creates significant uncertainty for in-house counsel.
“AI creates a whole different ballgame. Sometimes, it’s hard to explain (the risks) to business stakeholders or even legal stakeholders,” he said.
For AI platforms, the biggest challenges are navigating the complex legal environment without infringing on copyright, while also ensuring transparency and ethical considerations regarding model training data.
Content creators and original copyright holders need to pay attention to balancing rights protections with the economic incentives TDM exemptions offer to AI companies. Also, creators and copyright holders should evaluate whether commercial TDM exemptions erode creative industries or open new opportunities to create and create value.
Mammen said, “For policymakers and regulators, the challenge is to balance the interests of all these stakeholders. They want to create an environment that supports both creativity and innovation, while safeguarding intellectual property in a way that doesn’t cause unintended consequences.”
Regulators and lawmakers also need to keep up with the rapid pace of technological development—a daunting task.
“The biggest implication is that I don’t know what the implications are,” Lam said. “It’s up to each institution to figure out what meta-risks they are willing to take.”
Key Takeaways
- Evaluate Your IP Portfolio: Conduct a thorough review of your existing intellectual property assets to identify areas of vulnerability or potential misuse.
- Understand Relevant TDM Exemptions: Stay informed about the specific text and data mining (TDM) exemptions applicable in jurisdictions where your business operates to ensure compliance and strategic advantage.
- Establish Clear Data Use Policies: Develop internal policies for handling third-party data, ensuring proper licensing, and minimizing the risk of infringing on others' copyrights.
- Seek Legal Advice: Engage legal professionals experienced in intellectual property and AI-related issues to adapt your strategies to evolving laws and best practices.
- Invest in IP Monitoring Tools: Leverage technology solutions to monitor the use of your content and proactively address potential infringements.
- Stay Proactive About Compliance: Regularly update your understanding of legal developments in AI and copyright to remain ahead of potential regulatory changes.
- Engage in Advocacy and Industry Discussions: Participate in discussions influencing policy development to ensure your voice is heard and your business interests are represented in shaping future copyright regulations.
[View source.]