[co-authors: Michael Palmisciano, Laura Christeson, and Owen Feig]
Since generative AI began its rapid ascent in 2022, the creative, tech and legal industries have grappled with a fundamental question: does using copyrighted works to train AI models violate the rights of creators, or does it fall within the bounds of fair use?
The U.S. District Court for the Northern District of California recently weighed in on this issue in two closely watched cases involving the use of copyrighted books to train large language models (“LLMs”). In both cases, the court granted the defendant AI companies a limited victory, finding that the use of the plaintiffs’ books was transformative. This is typically the most crucial (and often determinative) element in the fair use analysis.
But the rulings came with caveats. In one case, the court left open the possibility of infringement where pirated works were used. In the other case, the judge effectively outlined a stronger argument for future plaintiffs focused on the market harm caused by the outputs of this use. These rulings underscore that while courts may view AI training as transformative, companies must proceed cautiously or risk significant liability.
Bartz v. Anthropic
On June 23, 2025, in Bartz v. Anthropic PBC, the court granted partial summary judgment in favor of Anthropic, finding that its use of purchased books to train its large language model, Claude, was transformative and qualified as fair use. However, the court declined to rule on whether the use of pirated books that were also part of Anthropic’s training data was lawful, leaving that issue for trial.
The plaintiffs, a group of authors, alleged that Anthropic had infringed their copyrights by using their books without authorization to train its AI systems. According to the complaint, Anthropic had both pirated millions of books from online sources and purchased physical books in bulk, which it then disassembled and digitized. The resulting digital library was used to train Claude on a subset of the collected content.
The court evaluated these claims under the four-factor fair use test: (1) the purpose and character of the use, including whether it is transformative; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the market for the original.
In response to the authors’ claim that Claude’s “memorization” of their works was not transformative, the court analogized the process to a human learning from reading literary works. In the same way that humans internalize what they read and generate thoughts based on what they have learned, the AI system is learning from what it “reads” in the LLM database and sufficiently transforms the works into a new product when it generates outputs.
As for Anthropic’s digitization of the books into its LLM database, the court separated its analysis between the purchased and pirated books. For the purchased books, the court found that merely converting physical copies into digital form did not infringe copyright, citing precedent that allows for format shifting when the purchaser owns the copy.
In contrast, the court ruled that the use of pirated books may constitute infringement. Because the works were illegally obtained and harmed the authors by displacing sales, copy for copy, this use could not be excused as fair use. The fact that Anthropic later purchased copies of the same books did not absolve it from liability for infringement, though it may lessen potential damages. The issue of whether this use infringed the authors’ copyrights will proceed to trial.
Kadrey v. Meta
On June 25, 2025, in Kadrey v. Meta Platforms, Inc., the same district court ruled on cross-motions for partial summary judgment that Meta’s use of copyrighted material to train its AI was transformative and fair use. Yet, the court emphasized that had the plaintiffs shown sufficient evidence of market harm or dilution—considerations under the fourth factor of the fair use test—the decision could have easily gone the other way.
In this case, thirteen authors, including Sarah Silverman and Ta-Nehisi Coates, alleged that Meta had violated their copyrights when it downloaded their books and used them to train Meta’s LLM, alliteratively called Llama. As in Bartz, the court recognized that training LLMs to perform diverse functions is a highly transformative use of copyrighted works. However, the judge in Kadrey placed greater weight on the fourth factor (market impact) and criticized the Bartz court for “blowing off” this piece of the analysis.
The plaintiffs brought two specific arguments regarding market harm. The first was that Meta’s AI models could reproduce small pieces of their copyrighted works. The second was that Meta’s use of unauthorized copies of the works reduced the authors’ ability to license their works to Meta for this purpose. The court rejected both arguments, finding that reproduction of small pieces of the works was too insignificant to cause harm, and that the loss of hypothetical licensing revenue was not legally cognizable in a case like this where works were used for a transformative purpose.
In an interesting turn, the judge suggested that a more compelling theory of market harm might lie in the outputs of the AI. If Llama, trained on the authors’ copyrighted works, can instantaneously generate competing works, it could flood the market with AI-generated content. This could dilute the market for human-authored works and undermine the incentive to create, which copyright law is designed to protect. But since the plaintiffs here did not make this argument, the judge ruled for Meta.
Comments
Although these decisions are short-term victories for the defendant AI companies, the long-term implications are far from clear. The opinions were narrowly tailored and heavily fact-dependent, and each specifically noted that it should not be interpreted as blanket support for LLMs. The judges agreed that the use of copyrighted works to train the LLMs was transformative, but they also recognized that this factor is not necessarily determinative of fair use. In cases where the input works are pirated or there is substantial evidence of market dilution, plaintiffs may have the upper hand.
In the short time since these cases were decided, several authors came together to sue OpenAI and Microsoft in the Northern District of California in a class action to protect their own copyrights and those of other authors whose works were used by the defendants. Class actions elevate the risk for companies that engage in piracy or fail to purchase licensing rights from copyright holders.
The downside risk for AI companies that get it wrong is potentially astronomical. With millions of copyrighted inputs, damages awards—including, in cases of willful infringement, up to $150,000 in statutory damages per work—could accumulate quickly. In the wake of these rulings, and despite the courts’ agreement about the transformative nature of the use, we anticipate a continued trend of AI companies licensing content from publishers and creators for training purposes to insulate against liability.
Over the next several years, we expect many of these decisions to be appealed, and ultimately for the U.S. Supreme Court to provide much needed clarity. The Bartz and Kadrey decisions are only the prologue to a much longer copyright and AI litigation story.