Unethical AI Training Harms Creators and Society, Argues AI Pioneer

At a daylong forum held Tuesday in Washington, D.C., to address the threat generative AI poses to copyright law and the creative industries, several veterans of AI companies discussed how their experiences led them to new ventures in the hope of establishing ethically trained AI models. The forum, dubbed "The Story Starts With Us," addressed the threat generative AI poses to copyright law and the creative industries. It was co-hosted by the Association of American Publishers and the Copyright Alliance.

Ed Newton-Rex, the founder of Fairly Trained—a U.K.-based nonprofit organization focused on ensuring ethical practices in the training of generative AI models—made a case for licensing creative works used to train AI systems and argued that current practices harm creators' livelihoods and society itself.

“I have worked in what we now call generative AI for 14 years, both at small startups and big tech companies, and so I know better than most that the technology and the vision behind generative AI are amazing,” Newton-Rex said. “But it’s my strong belief that stealing the work of the world’s creators to build it is not.”

While AI companies spend vast sums on engineers and computing power—up to $1 million per engineer and $1 billion per AI model—Newton-Rex explained that “many of them expect to get the third resource—training data—for free.” He emphasized that this training data represents “the life's work of the world's musicians, artists, writers, and other creators.” Newton-Rex is also a choral music composer as well as an AI pioneer, having founded Jukedeck, one of the earliest AI music generators.

He went on to argue that most AI companies currently do not license the majority of their training data, instead using web scrapers to collect content. He cited research from the Mozilla Foundation showing that 64% of large language models released between 2019 and 2023 were trained on datasets containing copyrighted works.

“Companies from OpenAI to Anthropic to [the AI music enigine] Suno have trained on copyrighted work in this way,” Newton-Rex said. “It's unfortunately and very, very rapidly become standard practice across much of the AI industry.”

He pointed out “AI generated content directly competes with the creators whose work they were trained on.” He provided several examples, including filmmakers who have said they will use AI music in all future projects, and artist Kelly McKernan whose income reportedly fell by 33% after their work was included in training data for the AI image generator Midjourney.

A study by freelance platform Upwork further supports his position, showing that job postings for writing tasks decreased by 8% following the introduction of ChatGPT, with an 18% drop for more complex writing assignments. Similarly, graphic design job postings fell by 18% after the introduction of AI image generators.

While AI companies often claim fair use protections, Newton-Rex disagreed, stating that “there's no way the fair use exception can be used to justify the mass exploitation of creative work to create models that compete with the creators behind those works.”

He also rejected comparisons between AI training and human learning. “Artists have been learning from each other for centuries.... This is baked in, it's part of the social contract of being a creator,” he said. “But with generative AI, companies worth millions or billions of dollars scrape as much content as they can, often against creators' wishes...creating a highly scalable competitor to those creatives.”

He added that the things AI might be used for to truly benefit society, such as scientific discovery, requires different types of input. “If you look at something like AlphaFold, it was not trained on creative works. It was trained on protein structures,” he said. “There have been no major scientific discoveries that have come from an AI model that has trained on people's music, people's writing, on creative, copyrighted work."

Certifying ethically trained AI

Newton-Rex said he founded Fairly Trained to certify AI models that don’t use copyrighted works without permission. (Maria Pallante, the CEO of the AAP is an advisor to Fairly Trained.) He pointed to several companies already taking this approach, including his own work at Stability AI, where he led the development of Stable Audio, “trained only on music that we had licensed” and was named one of Time magazine's best inventions of 2023.

He pointed out that unlawful training is having not just deleterious effects on creators’ incomes and livelihoods, but on the open internet. “Looking at the top 14,000 websites used for AI traning, [a report] found that over the course of a single year, the number of sites that were restricted by things like terms of service or robots.txt, rose from three percent to between 20 and 33.” he said.

According to Newton-Rex, the implication of this trend is vast: “The web is being gradually closed due to unlicensed training, and this is bad for new AI models for new entrants to the market, but also to everyone—consumers, researchers, and everyone who benefits from an open internet.”

He also noted that some creators are stopping publishing online because “they know AI companies are going to take their work and compete with them.” He characterized this as “disincentivizing publishing—the exact opposite of what copyright law is there to do.”

The majority of the public appears to support Newton-Rex’s position opposing unlicensed AI training. He cited a survey by the AI Policy Institute indicating that 66% of respondents believed AI companies should not be allowed to train on publicly available data without permission, and 74% believed rights holders should be paid for such use.

To demonstrate the creative community’s agreement, Newton-Rex organized the Statement on AI Training, which has garnered nearly 50,000 signatures from creators, ranging from author James Patterson to Radiohead frontman Thom Yorke. The statement was simple: “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.”

This February, he coordinated a protest album in the U.K. with contributions from more than 1,000 British musicians. The album, called Is This What We Want, consisted of recordings of empty studios and performance spaces to argue against the U.K. government’s proposed changes to release AI companies from obligations to paying creators for their works. The tracks spelled out the phrase: “The U.K. government must not legalize music theft to benefit AI companies.”

“What these creators are telling us is that this exploitation of their work is totally unjust, is a potentially catastrophic threat to their profession,” Newton-Rex said.

He concluded his talk in Washington by calling for a mutually beneficial relationship between the creative and AI industries, arguing that licensing “inflicts some kind of extra burden on AI companies, but ultimately they will reach exactly the same point—models that are just as capable, just as powerful—and they'll do so without forcing publishers to batten down the hatches and without setting the world's creatives increasingly against them.”