5/30/2025 4:15:26 PM | 2 minute read

Judge Alsup: LLM training may be fair use, but acquisition still matters

AI Laws and Regulations Concept. Hand typing on laptop with digital icons representing artificial intelligence, legal standard, ethics, and regulatory compliance, Technology law and policy, copyright,

Get in touch

Kyle Coogan

Counsel

Get in touch

Kyle Coogan

Counsel

Copyright lawsuits relating to artificial intelligence are increasing, and courts are weighing in. In a recent federal copyright suit in the Northern District of California involving Anthropic PBC, Bartz et al. v. Anthropic PBC , U.S. District Judge William Alsup pointed to an emerging legal distinction regarding copyright infringement implications for training large language models (LLMs). During a May 22, 2025 hearing, Judge Alsup stated that he was, "[i]nclined to say [Anthropic] did violate the Copyright Act, but that the subsequent uses [of copyrighted material] were fair use.” He marked a nuance between an LLM’s training data and the output it provides in response to a prompt, suggesting courts may begin to treat the acquisition of training data separately from the outputs an LLM generates.

Judge Alsup emphasized that training an LLM on copyrighted material may be protected by fair use, provided the material is not acquired in a way that infringes the exclusive rights granted to copyright holders under the Copyright Act. Anthropic argues that using copyrighted material to train its AI assistant, Claude, qualifies as fair use because the training process serves a fundamentally different purpose: producing a model that generates new, non-substitutive content.

However, the manner in which Anthropic obtained its training material remains a central issue. Plaintiffs allege that Anthropic infringed their copyrights by downloading their books from websites offering pirated content and then incorporating those unauthorized copies into Claude’s training corpus. They argue that this method of acquisition undermines any fair use defense. Obtaining copyrighted works without authorization, whether through piracy or scraping from restricted-access websites, raises unresolved legal questions that could have significant implications for how LLMs are trained.

Judge Alsup’s reasoning highlights a key technical and legal feature of generative AI: models like Anthropic’s Claude are trained on massive datasets, but they do not store those training materials. Rather, the materials in the datasets are used to adjust model weights and improve predictive capabilities. The use of copyrighted materials in these datasets may qualify as fair use, insofar as their use to train and adjust model parameters and weights could be considered transformative and non-substitutive.

According to Alsup, even if the use of such materials for training qualifies as fair use, improper acquisition – such as unlicensed downloading or unauthorized scraping – may still give rise to liability under the Copyright Act, just as it would outside the context of LLM training. Alsup’s focus on data sourcing suggests that LLM developers may eventually be required to license or audit their training data more rigorously. The strength of future fair use defenses may depend not only on how an LLM behaves, but also on how its training data was obtained.

Following the hearing, Judge Alsup requested supplemental briefing from the parties regarding a well-known Second Circuit decision often cited by AI developers. In that case, the court held that scanning books to create a searchable database and display limited snippets of copyrighted material qualified as fair use because it served a transformative, non-substitutive purpose. Judge Alsup's request suggests he is weighing whether a similar reasoning – permitting reproduction for a functional, transformative purpose – may apply to LLM training as well.

The next phase in this case will likely test the limits of that analogy. Depending on how Judge Alsup rules on summary judgement, the case may proceed to trial or could set the stage for appellate questions that may shape AI copyright law for years to come.

Judge Alsup emphasized that training an LLM on copyrighted material may be protected by fair use, provided the material is not acquired in a way that infringes the exclusive rights granted to copyright holders under the Copyright Act.