Significant media coverage has highlighted the tech industry’s increasing desire to use published content to train AI large language models (LLMs), often without permission, raising concerns about intellectual property rights and fair compensation for content creators. In response, many publishers have successfully pursued license agreements with AI companies in a bid to turn this challenge into an opportunity.
The most recent examples in the news include HarperCollins’s just-announced three-year partnership with an unnamed company (reported to be Microsoft) that will allow select nonfiction backlist titles to be used for AI training on an author opt-in basis, with a fee of $5,000 per book to be split between the author and publisher. In the academic and professional publishing space, Wiley and Taylor & Francis have also struck multimillion-dollar agreements with AI firms.
Of course, some significant hurdles remain. For one, authors appear to be especially skeptical of empowering the very technology many believe could replace them. On the other hand, many tech firms believe using unlicensed copyrighted works for AI training is fair use and completely legal, with several potentially landmark copyright lawsuits still unfolding that could decide the question.
Amid such uncertainty, it is critical for publishers and authors to capitalize on the growing demand for high-quality training data by establishing business and legal frameworks that can protect the interests of authors and rights holders, and balance innovation with ethical considerations.
The following offers publishers some initial advice on how to approach the rapidly evolving AI licensing landscape.
First, know your value
Sales, literary merit, cultural impact, and contributions to human knowledge are all factors than can improve an AI’s functionality and increase its marketability. And as you would expect, different kinds of published works have different values for different AI companies.
Trade publishers, for example, focus on popular fiction and nonfiction, including bestsellers and works with significant cultural impact. For these publishers, licensing terms might prioritize high-profile titles that can enhance an AI model’s language and contextual understanding.
For STM publishers, the emphasis is on the quality, recency, and reliability of their content and data. Peer-reviewed articles, research papers, and up-to-date data are particularly valuable for AI models requiring precise and accurate information.
For educational publishers, the alignment of content with various educational levels, curriculum standards, and educational effectiveness is paramount. Licensing agreements might focus on comprehensive coverage of subjects, ensuring that AI tools can provide accurate and relevant content for educators.
In each case, understanding the value of your content is key. For example, specialized knowledge, particularly from authoritative sources, enhances an AI’s ability to generate accurate and insightful outputs. Recent publications are also more valuable for AI models, as timeliness can significantly impact the AI’s performance in generating current and relevant responses. High-quality writing improves the AI’s ability to generate natural, contextually appropriate language—particularly important for models designed to interact with users conversationally.
And of course, the quantity of books, words, or “tokens” (the fundamental unit of data that can be processed by an algorithm) is also an important factor. As with traditional collection licensing terms, a larger backlist will yield a larger payout. In fact, AI licenses often comes down to a cents-per-token valuation.
Fairness first
As complex as establishing the value of your content to an AI company may be, it’s only one part of the task. Several equally complex issues must also be addressed.
One of the first steps in licensing content for AI training is verifying that you hold the rights to execute the license. For backlist titles with older contracts, publishers may have to go back to agents and authors to negotiate for rights that weren’t granted initially. (Interestingly, AI itself can assist in this process by scanning and analyzing contracts). Ensuring that all agreements going forward cover AI-related uses is essential to avoid legal complications.
Developing a fair revenue allocation model is also vital. Traditional methods of sharing royalties based on sales are not directly applicable when content is used to train AI models. So, how to proceed?
One possibility is to consider historical sales performance when allocating licensing revenue back to authors. Bestselling titles might command higher fees due to their proven value and popularity. Assessing the lifetime value and ongoing relevance of titles can ensure that both publishers and authors receive fair compensation over time.
Another innovative approach is to link AI outputs back to specific titles depending on context. This not only provides proper attribution but also significant marketing and impact benefits.
By linking AI outputs to titles, publishers can drive new interest and sales for backlist content, ensure that original works receive proper recognition, and promote the author’s and the publisher’s catalog. Additionally, this approach builds trust with LLM users, showing that AI-generated content is grounded in credible sources.
Other considerations
Transparency in licensing agreements is also essential to building trust with authors and stakeholders. Clearly outlining the terms and ensuring open communication helps maintain positive relationships and can avoid disputes down the line.
Addressing ethical concerns related to AI training—such as bias and misinformation—is also crucial. Publishers must ensure that their content is used responsibly and contributes positively to AI’s development. This involves setting enforceable guidelines and standards for the ethical use of content.
And given the rapid advancements in AI technology, it’s important to ensure that licensing agreements are flexible enough to accommodate future changes. This includes provisions for updating terms as new uses and technologies emerge to ensure long-term relevance and fairness.
Change has come
Finally, don’t wait. Licensing backlist content to AI companies presents exciting opportunities and significant challenges for publishers. But however one views AI at this moment, there is no question that the technology is fundamentally changing the way content is discovered, consumed, and, in some cases, created.
Advanced AI-driven search engines will be soon able to surface backlist content with unprecedented accuracy, connecting users to materials in ways traditional search methods could not. The dawn of the AI age may be daunting, but establishing a transparent, ethical, and futureproof licensing framework now can help publishers maximize the opportunities presented by AI while also safeguarding their interests.
Ken Brooks is the founder of the consulting firm Treadwell Media Group and a founding partner of Publishing Technology Partners.