Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
The issue is that fair use is more nuanced than people think, but that the barrier to claiming fair use is higher when you are engaged in commercial activities. I’d more readily accept the fair use arguments from research institutions, companies that train and release their model weights (llama), or some other activity with a clear tie to the public benefit.
OpenAI isn’t doing this work for the public benefit, regardless of the language of altruism they wrap it in. They, and Microsoft, and hoovering up others data to build a for profit product and make money. That’s really what it boils down to for me. And I’m fine with them making money. But pay the people whose data you’re using.
Now, in the US there is no case law on this yet and it will take years to settle. But personally, philosophically, I don’t see how Microsoft taking NYT articles and turning them into a paid product is any different than Microsoft taking an open source projects that doesn’t allow commercial use and sneaking it into a project.