The rapid rise of generative AI has thrust the question of “fair use” into the legal spotlight, especially as courts grapple with whether using copyrighted works to train AI models without permission is lawful. High-profile lawsuits—like those brought by novelists against tech giants—have focused on whether feeding large volumes of creative works into foundation models (the broad, general-purpose AI systems) for training is fair use under copyright law. While the law remains unsettled, these cases are critical for authors, publishers, and other creators whose works are widely used in foundational AI training.
But for many companies, the real value lies not in mass-market creative content, but in specialized, proprietary materials—think clinical guidelines in healthcare, claims data in insurance, or operational and performance data in data center management. These types of copyrighted works may not be as useful for training general-purpose AI, but they are extremely valuable for post-training or “fine-tuning” models to excel in specific, high-value tasks. As AI adoption accelerates, the question becomes: can companies use such proprietary data for fine-tuning without a license, or does copyright law stand in the way?
Unlicensed Use of Proprietary Data for AI Fine-Tuning Likely Is Not Fair Use
The reasoning of recent court decisions suggests that unlicensed use of proprietary, copyrighted data for AI fine-tuning is unlikely to qualify as fair use. In Kadrey, the court reasoned that while the fair use analysis for foundation model training is complex and fact-specific, the most important factor is market harm. If AI outputs threaten to substitute for or compete with the original works, fair use is unlikely to apply. The court emphasized that the incentive to create is at the heart of copyright law, and allowing AI companies to use copyrighted works without permission—especially when the outputs could flood the market with competing content—would undermine that incentive.
Similarly, in Bartz, the court found that while using copyrighted works to train AI models to be transformative, this does not automatically make it fair use. If the AI’s outputs closely mimic or substitute for the originals, or if the use harms the market for licensing those works, copyright holders’ rights are at risk.
When applying the fair use factors to the facts of those cases and arguments presented by the parties relating to the use of copyrighted works to train large language models, both courts concluded that the uses were fair use. But applying these legal principles to proprietary, specialized data ideal for fine-tuning leads to a different result. Such data is often not widely available, and its value is tied to its exclusivity and the specialized markets it serves. If an AI company were to use this data without a license to fine-tune a model the copyright owner would have a stronger claim for infringement. The courts have made it clear: the more the AI’s use threatens the market for the original work, the less likely fair use will apply.
Monetizing Proprietary Data in the AI Era
Given these legal realities, companies with valuable proprietary data should take proactive steps to protect and monetize their assets:
- Develop Licensing Programs: Create clear, tiered licensing structures for AI training and fine-tuning, tailored to different use cases and customer segments.
- Protect Data Assets: Use technical safeguards (such as watermarking or access controls) and robust contractual terms to prevent unauthorized use and monitor for infringement.
- Build Value-Added AI Products: Leverage your own data to develop specialized AI tools or partner with AI developers to co-create models, capturing more of the value chain.
- Engage in Collective Licensing: Join or help form industry consortia to streamline licensing, set standards, and ensure fair compensation for data use.
By understanding the legal landscape and taking strategic action, companies can transform their proprietary data from a passive asset into a powerful driver of innovation and revenue—while ensuring their intellectual property is respected in the age of AI.