Ask ChatGPT about comedian Sarah Silverman’s memoir, “The Bedwetter,” and the AI chatbot can provide a detailed summary of the book. This raises the question: did ChatGPT legitimately “read” the book or did it illegality scrape information from pirated copies and customer reviews? Sarah Silverman recently filed a copyright infringement lawsuit against OpenAI, the maker of ChatGPT, claiming that they used her book without permission. Similar lawsuits have been filed against other AI developers like Meta (parent company of Facebook and Instagram). These cases shed light on the use of valuable data to train generative AI products that create new text, images, and music. The ethical and legal foundations of these tools are being called into question, especially as they are projected to contribute trillions of dollars to the global economy.
Matthew Butterick, one of the lawyers representing Silverman and other authors in a class-action case, claims that the machine learning industry’s use of book data obtained from illicit sites is an open secret. OpenAI and Meta have declined to comment on the allegations. However, legal battles against tech giants like Google, who faced challenges to their online book library, have been difficult for authors to win. The U.S. Supreme Court ruled in 2016 that Google’s digitization and partial display of books did not amount to copyright infringement. Deven Desai, an associate professor of law and ethics, believes that what OpenAI has done with books may be legally permissible based on the Google Books precedent.
While only a few authors, including Sarah Silverman, Mona Awad, and Paul Tremblay, have filed lawsuits, concerns about exploitative practices in the AI-building industry are growing within the literary and artist communities. Prominent authors, such as Nora Roberts, Margaret Atwood, Louise Erdrich, and Jodi Picoult, signed an open letter to CEOs of AI developers, accusing them of using their language, style, and ideas without compensation. They argue that the billions spent on developing AI technology should warrant fair compensation for using copyrighted works. Large language models, like ChatGPT, Google’s Bard, and Microsoft’s Bing chatbot, have learned from analyzing vast amounts of text, including books. OpenAI has acknowledged the value of books in training their models.
Originally, OpenAI’s GPT-1 model relied on the Toronto Book Corpus, which included unpublished books. Books are considered crucial for high-quality language models due to their well-edited and coherent writing. However, the sources of data used by top AI developers, including OpenAI, have become increasingly secretive. There is circumstantial evidence suggesting that shadow libraries containing pirated content were used, including works from Silverman and other authors. Joseph Saveri, one of Silverman’s lawyers, stated that the other side has not yet provided an explanation of how ChatGPT ingested Silverman’s books.
It remains to be seen how OpenAI will formally respond to the lawsuit, but if the case proceeds, tech executives may be required to testify about the sources of the books used in their models. Authors are not necessarily demanding that tech companies abandon their algorithms and training data, but they believe that some form of compensation is necessary. The U.S. Federal Trade Commission has previously compelled companies to destroy AI data that was obtained unlawfully. The outcome of these lawsuits will have significant implications for the future of AI development and the protection of authors’ rights.
The whytry.ai article you just read is a brief synopsis; the original article can be found here: Read the Full Article…