Authoring AI – How the Copyright Act of 1976 Protects Authored Works From Machine Learning Databases

Since its public launch in 2022, OpenAI’s ChatGPT has become one of the most used data tools on the market.¹ Its seemingly endless knowledge has greatly increased efficiency in many industries and inspired many other AI tools, including chatbot Google Bard.² However, its method of data gathering has been the subject of many controversies and legal challenges, with the newest one coming from authors whose works have been fed to ChatGPT without compensation. As language learning models continue to develop, courts may have to review their outputs to prevent copyright infringement. This article will discuss the Copyright Act of 1976, the newest lawsuits, and how the precedents set by these courts can be leveraged against ever-changing technology.

Copyright Infringement at a Glance

Copyright infringement occurs when a copyrighted work “is reproduced, distributed, performed, publicly displayed, or made into a derivative work without the permission of a copyright owner.”³ Authors under copyright law are “the creator[s] of the original expression in a work.”⁴ They own the copyright unless there is a written assignee of the work, such as a publisher.⁵ So, for example, the author of a book or play has exclusive rights to its publication or derivative use and can unilaterally decline to distribute, perform, or license it or any part of it for the duration of the copyright. The U.S. Copyright Act of 1976 gives the federal courts exclusive jurisdiction over infringement claims,⁶ and provides the basis for the complaints described in this article: Authors Guild v. OpenAI, Chabon et al v. OpenAI, and The New York Times Co. v. Microsoft Corp. et al.⁷

How ChatGPT Works

ChatGPT uses generative pre-trained transformer models (otherwise known as GPT models, hence “ChatGPT”) created by OpenAI, ChatGPT’s parent company.⁸ First, ChatGPT gains its knowledge primarily from physical inputs, the internet, documents, or open-source internet content.⁹ The model will go through all the data to understand what it is being presented with. Using transformative architecture, which pulls out the most important words in a sentence to create patterns, ChatGPT answers complex and simple questions instantaneously.¹⁰ In short, ChatGPT takes a scrape of all online information and all its inputs, and then pushes out answers to virtually any problem posed within seconds.

The Lawsuits in Question

In September, a class action lawsuit was filed against OpenAI in the Southern District of New York, alleging copyright violations.¹¹ The plaintiffs are seventeen fiction authors, including George R.R. Martin (A Game of Thrones), Jodi Picoult (My Sister’s Keeper), and John Grisham (The Exchange.)¹² The suit was filed through the Authors Guild, and alleges that despite OpenAI being able to purchase the works or “pay a reasonable licensing fee,” the company completely evaded the Copyright Act by accessing databases of books to feed their machine systems.¹³ This constitutes copyright infringement, and the Guild wants to enjoin OpenAI “from infringing Plaintiffs’ and class members’ copyrights” and “from using Plaintiffs’ and class members’ copyrighted works in ‘training’ Defendants’ large language models without express authorization.”¹⁴

In October, a similar lawsuit against OpenAI was filed in the Northern District of California.¹⁵ The case, Chabon et al. v. OpenAI, Inc, includes authors Ta-Nehisi Coates and Jacqueline Woodson, among other plaintiffs.¹⁶ The main issue raised is that OpenAI illegally uses copyrighted works to train their AI learning machines to generate an in-depth analysis of the themes in an author’s copywritten works.¹⁷ All of the plaintiffs in this case have exclusive rights to their works under the Copyright Act of 1976.¹⁸ The complaint alleges that OpenAI used BookCorpus, a free dataset of over 11,000 books created to train language models.¹⁹ The primary controversy with BookCorpus is that it does not compensate authors and most of the works it publishes are under copyright.²⁰ The plaintiffs claim that they “did not consent to the use of their copyrighted works as training materials for GPT models” and OpenAI “benefit[s] commercially and profit[s] handsomely from unauthorized and illegal use of the . . . copyrighted works.”²¹ These authors are seeking declaratory relief, a permanent injunction against OpenAI, and damages.²²

At the end of December, another critical copyright infringement claim was filed in the Southern District of New York. The New York Times, the second largest daily newspaper in the United States, heavily relies on intellectual property laws and negotiated licensing agreements to protect its ability to produce quality journalism.²³ The company filed a complaint against Microsoft and OpenAI, alleging ChatGPT “copies and uses millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.”²⁴ The lawsuit also alleges that OpenAI did not legally license the content and that OpenAI verbatim copies what articles in the Times say.²⁵ The Times alleges that this is more harmful than something like a Google summary because the tools “undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”²⁶ While the complaint does not list a specific requested damages amount, legal experts estimate it will fall between $2.25 billion and $450 billion, making it the largest intellectual property infringement settlement ever awarded.²⁷

Potential Remedies for These Plaintiffs

In general, U.S. copyright law allows for temporary or final injunctions, actual damages, and statutory damages for injured parties.²⁸ Injunctions do not seem like a feasible option for these lawsuits above because the only way to remove a portion of a finished AI model’s learning base is by re-training the algorithms from scratch.²⁹ This would effectively shut down ChatGPT entirely and force OpenAI to start with a new model, which seems like too much of a burden.
On the other hand, the Copyright Act lays out set amounts for statutory damages in 17 U.S.C. § 504 (c) (1) – (2). Statutory damages can range from $750 to $150,000 per instance of copying. Courts will have to determine when copying occurs for language models – does copying occur every time the model responds to someone’s search, or does copying only occur when the information is fed to the model? This would vary the potential damages greatly.

The Potential Precedent of these Current Lawsuits

The cases above argue that AI usage has contravened the right of authors to complete autonomy over the use of their copyrighted work. Instead, OpenAI made a unilateral decision to feed books and newspaper articles to ChatGPT and used them to generate derivative, incredibly detailed summaries. AI has also touched a multitude of media besides print media– challenges have been raised in music³⁰ and art³¹ as well. In determining whether OpenAI violated copyright law, the courts may look at how OpenAI acquired the information fed to ChatGPT and how much of ChatGPT’s model is based on this information. Most of the information at issue came from a pirated source, which directly takes money out of authors’ pockets.³² An author’s main form of compensation is from royalties, so when their work is pirated, they lose revenue directly.³³

The courts are being invited to set a precedent that authored material is protected against machine learning reproduction. Legal challenges related to machine learning and copyright law present cases of first impression as ChatGPT and similar AI tools are still new and unfamiliar. To continue to incentivize creators across all media, copyright needs to protect them from unauthorized acquisition, reproduction, and uses that deprive creators of profit and control over their work. That includes legal protections against machine learning forged from that work. The risk is that the law fails to anticipate how AI tools will continue to evolve, and that rulings today may stunt useful future developments in machine learning.

Written by: Selma Jay
Selma is a 2025 J.D. Candidate at Brooklyn Law School

Article Sources

¹ ChatGPT reached 100 million users in January 2023, just two months after its launch. For reference, TikTok took nine months and Instagram took two-and-a-half years to reach 100 million users. Krystal Hu, ChatGPT sets record for fastest-growing user base, Reuters, https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.
² Google Bard was unveiled to the public February 6, 2023 and released March 21, 2023. Sabrina Ortiz, What is Google Bard? Here’s everything you need to know, ZDNet, https://www.zdnet.com/article/what-is-google-bard-heres-everything-you-need-to-know/.
³ Definitions (FAQ), U.S. Copyright Off., https://www.copyright.gov/help/faq/faq-definitions.html[https://perma.cc/6PUD-4M8R].
⁴ Id.
⁵ Id.
⁶ 17 U.S.C. § 301(a) (2023).
⁷ The Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 Other Authors File Class-Action Suit Against OpenAI, The Authors Guild, https://authorsguild.org/news/ag-and-authors-file-class-action-suit-against-openai/; Chabon v. OpenAI, BakerHostetler, https://www.bakerlaw.com/chabon-v-openai/; Adam Clark Estes, How copyright lawsuits could kill OpenAI, VoxMedia, https://www.vox.com/technology/2024/1/18/24041598/openai-new-york-times-copyright-lawsuit-napster-google-sony.
⁸ Id.
⁹ Alex Hughes, ChatGPT: Everything you need to know about OpenAI’s GPT-4 tool, BBC Science Focus, https://www.sciencefocus.com/future-technology/gpt-3.
¹⁰ How Does ChatGPT Work?, Atria Innovation, https://www.atriainnovation.com/en/how-does-chat-gpt-work/#:~:text=GPT%20(Generative%20Pre%2Dtraining%20Transformer,sentence%2C%20using%20transformations%20and%20attention.
¹¹ Authors Sue Open AI, owner of Chat GPT over copyright infringement, WBUR,https://www.wbur.org/hereandnow/2023/09/22/open-ai-lawsuit-authors# [https://perma.cc/8QMJ-4YB5].
¹² Id.
¹³ Comp., 2, Authors Guild et al. v. OpenAI Inc. et al., No. 1:23-cv-8292 (S.D.N.Y.).
¹⁴ Id. at 46.
¹⁵ Amend. Comp., 2, Chabon et al. v. OpenAI, Inc. et al., No. 3:2023cv04625 (N.D. Cal.).
¹⁶ Id.
¹⁷ Id.
¹⁸ Id. at 23.
¹⁹ Id. at 10.
²⁰ Id.
²¹ Id.
²² Id. at 25.
²³ Comp., 3, The New York Times Co. v. Microsoft Corp. et al, No 1:23-cv-11195 (S.D.N.Y.).
²⁴ Id. at 2.
²⁵ Id.
²⁶ Id. at 3.
²⁷ The Times estimates 3 million articles were copied from its database, which is how these numbers were calculated. Thomas Carey, The New York Times v. OpenAI: The Biggest IP Case Ever, Sunstein LLP, https://www.jdsupra.com/legalnews/the-new-york-times-v-openai-the-biggest-5149037/.
²⁸ 17 U.S.C. §§ 501 – 504 (2023).
²⁹ Stephen Pastis, A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data, Fortune, https://fortune.com/europe/2023/08/30/researchers-impossible-remove-private-user-data-delete-trained-ai-models/#.
³⁰ Universal Music Group is currently suing Anthropic PBC, developer of chatbot Claude, for copying and “disseminating lyrics to music controlled by [UMG].” Comp., 20, Concord Music Group, Inc. v. Anthropic PBC, No. 3:23-cv-01092 (M.D.Tenn.).
³¹ There is currently a class action lawsuit against Stability AI, with plaintiffs arguing AI models were trained using their copyrighted images to generate new art without credit. Amend. Comp., 1, Sarah Andersen et. al. v. Stability AI Ltd. et. al., No. 3:23-cv-00201-WHO (N.D. Cal.).
³² Julia Rittenberg & Kelly Main, What is Copyright? Everything You Need to Know, Forbes Advisor, https://www.forbes.com/advisor/business/what-is-copyright; see also Digital Piracy & Copyright Infringement, IIPRD Consulting, https://www.iiprd.com/digital-piracy-copyright-infringement/ (stating third-party online services that allow file-storing and sharing for free is a form of digital piracy.)
³³ How do authors get paid? The Society of Authors, https://www.societyofauthors.org/Where-We-Stand/buying-choices/How-do-authors-get-paid [https://perma.cc/8WJ7-E2YP].

Authoring AI – How the Copyright Act of 1976 Protects Authored Works From Machine Learning Databases

Copyright Infringement at a Glance

How ChatGPT Works

The Lawsuits in Question

Potential Remedies for These Plaintiffs

The Potential Precedent of these Current Lawsuits

Article Sources

Related Posts

Turbulent Time for TikTok

The Force of Force Majeure in Covid-19 Era Events Contracts

Social Media Giants: First Amendment Implications & Legislative Immunity

“Toxic” Family Feud: Inside Britney Spears’ Legal Struggles