On June 23, 2025, Senior Judge William Alsup of the Northern District of California issued a highly anticipated summary judgment opinion in Bartz v. Anthropic PBC. This is one of the first decisions of many to come dealing with infringement claims by copyright holders against generative AI companies like Anthropic, OpenAI and Perplexity.
In Bartz, the plaintiffs are authors whose books were used to train Anthropic’s popular Claude LLMs. In its decision, the court drew a sharp distinction between two issues: whether using copyrighted books to train a generative AI system constituted fair use, and the legality of how those books were obtained and stored.
The court granted summary judgment for Anthropic on the fair use question, finding that training Claude on copyrighted books was permissible under 17 U.S.C. § 107. However, it denied Anthropic’s motion for summary judgment with regard to Anthropic’s practice of acquiring and retaining a large, centralized internal library of pirated books.
While this opinion is not binding on other courts, the decision may very well be influential and offers a useful early framework that may inform how judges, regulators and companies approach copyright compliance in this rapidly developing area.
Training LLMs on Books Is Protected by Fair Use
The court ruled that Anthropic’s use of books to train its LLMs qualified as fair use. His analysis focused on the first and fourth statutory fair use factors: the purpose and character of the use, and the effect on the market for the original works.
When evaluating the first factor – the purposes and character of the use – courts often look to whether the accused use is “transformative.” The Court did not mince words on this point or find that this factor was a close case. Judge Alsop wrote that the “‘purpose and character of using works to train LLMs was transformative – spectacularly so.” In other passage, the Court described the training process as “quintessentially transformative.” “Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them – but to turn a hard corner and create something different.” The Court was careful to note that Claude does not output full books or lengthy excerpts, which would have resulted in a different case.
As to the fourth factor – market harm – the Court rejected Anthropic’s argument that training Claude on plaintiffs’ books displaced or will displace a licensing market of copyrighted materials being used for AI training. This is an interesting aspect of the decision because there is a market for the licensing of copyrighted materials for purposes of training LLMs. The Court noted, however, that “such a market for that use is not one the Copyright Act entitles Authors to exploit.” This aspect of the decision deserves further analysis.
Significantly, Judge Alsup assumed for purposes of the motion that Claude “memorized” parts of the books in training but held that this did not, in itself, negate fair use. He also distinguished between the training process and specific outputs, making clear that his decision did not address whether Claude might, in specific instances, output infringing content.
This portion of the decision is massive and could set the groundwork for further judicial approval of the method used to train LLMs and thus power tools like ChatGPT and Claude.
Retention of Pirated Works as a “Library” Was Not Protected by Fair Use
While the court was receptive to fair use in the context of training, it flatly rejected Anthropic’s defense of its acquisition and long-term storage of over seven million pirated books. These books were downloaded from shadow library sites like Pirate Library Mirror and Library Genesis and stored in a central repository that Anthropic employees could access for model training and internal research. Notably, the Court relied in part on an internal Anthropic email in which an employee was tasked with obtaining “all the books in the world” while avoiding as much “legal/practice/business slog” as possible.
The Court held that this retention was not fair use. Judge Alsup emphasized that the books were not merely downloaded temporarily for specific training runs, but were kept long-term, indexed, and made broadly available within the company. “This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” such as training an LLM. The Court further noted that prior case law demonstrates “why Anthropic is wrong to suppose that so long as you create an exciting end product, every ‘back-end step, invisible to the public,’ is excused.” The Court also distinguished the Google Books ruling on the grounds that the books that were scanned by Google in that case were lawfully acquired. On the other hand, the Court also found that purchasing millions of print copies of books, scanning them and destroying the print copies did constitute fair use.
The decision thus sends a clear signal that copyright compliance in the AI context must include not just how works are used, but how they are acquired and handled internally. Further, the decision has practical consequences for companies building or using training datasets. Even if the end use of the material is transformative and noncommercial, the court found that acquiring copyrighted works through illicit means — and storing them in an internal library for future reference — is a standalone violation of the Copyright Act.
How This May Influence Other AI Copyright Cases
While Judge Alsup’s ruling is not binding outside the Northern District of California, it may influence how other courts think about copyright and AI, particularly in the high-profile lawsuits consolidated in the Southern District of New York. That multidistrict litigation (MDL) includes claims by authors, newspapers and other content creators against OpenAI.
Judge Alsup’s decision provides a framework that may be persuasive: he emphasizes the importance of the transformative nature of training while drawing a bright line against pirated acquisition. Still, it would be premature to assume that courts in the Southern District of New York will follow suit. The MDL judges may view the issues differently, particularly where plaintiffs can point to more specific evidence of the licensing of copyrighted materials.
Key Compliance Takeaways
For in-house counsel overseeing AI development or procurement, the Bartz ruling underscores the importance of proactive copyright risk management. Most notably, the decision makes clear that how you acquire and handle training materials is just as important as how your models ultimately use them.
This places a premium on cleaning training data. Companies should consider carefully auditing the sources of their training corpora, removing unauthorized works, documenting material provenance, and — where possible — seeking licenses or relying on compliant data providers. Vendor contracts with GenAI providers should also be reviewed to ensure that datasets provided for AI training do not contain unauthorized copyrighted content, and that indemnities cover potential copyright exposure.