From Input to Impact: The Market Harm Standard Emerging in AI Fair Use

Ballard Spahr LLP
Contact

Ballard Spahr LLP

Summary

Another federal court recently ruled that using copyrighted books to train artificial intelligence (AI) systems can qualify as fair use under the U.S. Copyright Act. This time, the court said that, because the issue of market dilution was so important, authors would have survived summary judgment had they presented any “meaningful evidence on market dilution at all.”

The Upshot

  • In Kadrey v. Meta Platforms, Inc., the federal court for the Northern District of California granted partial summary judgment to Meta. The court found that Meta’s use of copyrighted books to train its large language model (LLM) was transformative—satisfying a key prong of the fair use analysis.
  • Though the ruling aligns in this respect with Bartz v. Anthropic PBC—issued just days earlier in same district—it diverges in others, notably Meta’s greater emphasis on another prong of the fair use analysis: market harm.
  • The role of “piracy”— unauthorized acquisition of copyrighted works—is another point of divergence in the decisions, with the judge in Meta adopting a softer view on piracy in the context of fair use.

The Bottom Line

The Meta ruling elaborates on Anthropic’s framework for evaluating how courts may assess the sourcing, use, and storage of copyrighted materials in AI development. The Meta court urges closer scrutiny of how outputs affect the markets for the copyrighted works used to train the system, in addition to examining the sources of generative AI training inputs.

The ruling suggests that copyright holders may prevail on an infringement claim by showing that the AI training at issue is likely to result in a proliferation of competing works that reduce the commercial value of the original works—satisfying the “market harm” prong of the fair use analysis. The decision signals that future plaintiffs will need to tailor their claims to the specific features of their works—for example, sound bites, musical compositions, film, photographs, art, poetry, or voiceover—and the competitive dynamics of the markets in which those works are consumed.

When Judge Vince Chhabria of the Northern District of California granted summary judgment in favor of Meta Platforms, Inc. (Meta) on the plaintiffs’ copyright infringement claim on June 25, the court found that Meta’s use of copyrighted books to train its LLM, Llama, was a fair use under Section 107 of the Copyright Act. The ruling follows the recent decision in Bartz v. Anthropic PBC, where the court likewise found that training an LLM on copyrighted books qualified as fair use.

But Kadrey v. Meta took a distinct analytical path—rejecting Anthropic’s analogy between LLMs and student writers, offering a more detailed view of how market harm might arise, and treating the relevance of piracy within the fair use analysis in a notably different way.

The Meta court wrote that legal exposure may still arise if the model generates outputs that infringe protected works. In Meta, the works at issue were books, and the court placed weight on the fact that Llama did not produce outputs that replicated the plaintiffs’ original texts. But if the underlying training data involved music, film, photography, poetry, or voice recordings, the analysis—and the outcome—could be quite different.

Transformative Use Was Clear-Cut

As in Anthropic, the Meta court found that training a generative language model on copyrighted books served a new and different purpose: rather than replicating or distributing the works, the training process enabled the model to learn patterns in language and generate new content. The court had little trouble concluding that this factor weighed in favor of fair use. Meta’s alignment with Anthropic on this point suggests that, going forward, the transformative nature of LLM training may be less contested than other aspects of the fair use analysis.

A New Lens on Market Harm for LLM Training

The court found that the plaintiffs failed to present a viable theory of market harm. Both of their primary arguments—that they were deprived of the opportunity to license their works for AI training and that the models were capable of reproducing their content—were unpersuasive.

With respect to licensing, the court explained that harm to a market for licensing LLM training is not cognizable because it arises from the very use being challenged. Accepting that theory would render the fourth fair use factor circular: any defendant accused of infringement would, by definition, be undermining a licensing market for that same use. With respect to the alleged reproduction of the plaintiffs’ works, the court noted that the plaintiffs had provided no evidence that the models could generate substantial portions of their works, with outputs typically limited to brief excerpts of around 50 words.

Although the court granted summary judgment on the record before it, it went on to hypothesize how market harm might arise in future LLM cases. The decision emphasized that the most plausible form of harm is not traditional substitution, where an infringing work copies or closely mimics the original, but market dilution, in which AI-generated outputs crowd the same creative space and weaken demand for human-authored works. The court was careful to distinguish this from ordinary market saturation: the concern is not merely that generative AI can produce a high volume of similar material, but that it can do so precisely because it was trained on the very works it ends up displacing. In doing so, the court explicitly rejected the analogy drawn in Anthropic between LLM training and teaching children to write. It viewed that comparison as inapt, reasoning that the ability of LLMs to generate competing works at scale posed a distinctly different kind of market risk than the more attenuated risk that arises when human learners use copyrighted works to develop their own writing skills.

In the court’s view, the ability of LLMs to flood the market with similar but non-infringing works introduces a form of indirect substitution that copyright law must be flexible enough to recognize. It reduces the visibility, sales, and long-term viability of original works, not by copying them directly, but by extracting their value and redirecting it into algorithmically generated competitors. The court framed this market dilution framework as a natural evolution of the market harm inquiry and reasoned that the fair use analysis must be flexible enough to encompass new forms of indirect substitution that emerge from transformative technologies.

“Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four (market harm) would have needed to go to a jury,” Judge Chhabria wrote.

The court emphasized that the risk of market dilution is not uniform, and that certain types of works are more likely to face competitive pressure from LLM-generated content than others. For example, it suggested that magazine articles and nonfiction books, like how-to guides or biographies, could be especially vulnerable to market dilution if LLMs are used to generate comparable content at scale. Print journalism may face an even greater threat, given the potential for models to summarize or report on events in ways that compete directly with human-authored news. In contrast, the court noted that markets for well-known authors or highly distinctive works, such as memoirs or literary fiction, may be less affected, since readers often seek those titles out based on author identity, voice, or reputation. For fiction, the court indicated that the impact may depend on the genre, with categories like romance and spy novels being more likely to face competition from AI-generated alternatives.

Taken together, these observations reflect the court’s view that the risk of market harm from LLM training will vary depending on the nature of the copyrighted work and how it functions in the market. The decision signals that future plaintiffs will need to tailor their claims to the specific features of their works and the competitive dynamics of the markets in which those works are consumed.

Piracy Does Not Automatically Defeat Fair Use

The Meta decision adopted a more relaxed view of how piracy factors into the fair use analysis, concluding that unauthorized acquisition of copyrighted works is not necessarily dispositive when the use of those works is clearly transformative. In contrast, the Anthropic court concluded that training on pirated books was “inherently” and “irredeemably” infringing.

The court concluded that Meta’s downloading of books from pirate libraries did not preclude a finding of fair use. It reasoned that the downloading could not be evaluated in isolation, since it was done for the purpose of training LLMs, a use the court deemed highly transformative. The court explained that piracy may be relevant to a finding of bad faith under the first fair use factor but concluded that even if such a finding were made, it would not outweigh the highly transformative nature of the use. The court explained that bad faith would not be dispositive where the other statutory factors favor fair use. This view stands in contrast to Anthropic, where the court treated the use of pirated materials as integral to its denial of fair use. The divergence between the two decisions leaves unresolved how courts will treat unauthorized sourcing under the fair use framework, creating legal uncertainty for developers and for companies that rely on AI systems trained on datasets that may include unlicensed or pirated materials.

Guidance and Takeaways

The Meta ruling reinforces that courts are willing to treat LLM training as transformative but signals that fair use defenses will increasingly turn on whether the training creates a risk of diminishing the economic opportunities for the copyrighted-protected works (i.e., market dilution). Models that produce content in commercially dense genres, such as journalism, biography, or instructional nonfiction, may face greater legal scrutiny, especially if those outputs begin to compete with human-authored works.

For copyright holders, Meta offers (in dicta) a roadmap for how future claims might be framed: generic objections to unauthorized copying are unlikely to succeed without evidence of market impact. Plaintiffs will need to show that the market for their works will be meaningfully diluted by the proliferation of AI-generated content trained on those works—not merely that their content was copied or ingested.

Companies that license or deploy AI systems should reassess their risk exposure in light of the differing approaches taken in Anthropic and Meta. While the Meta court declined to treat piracy as dispositive to the fair use analysis, Anthropic reached the opposite conclusion—highlighting the lack of consensus on how sourcing practices should factor into the fair use framework. In this unsettled environment, contractual protections around training data provenance, indemnification, and output controls remain critical tools for managing legal and reputational risk.

[View source.]

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.

© Ballard Spahr LLP

Written by:

Ballard Spahr LLP
Contact
more
less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

  • Increased visibility
  • Actionable analytics
  • Ongoing guidance

Ballard Spahr LLP on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide