Investigations by Proof News and Wired revealed that major AI firms have been using YouTube transcripts from nearly 175,000 videos without their creators’ permission. These transcripts are an unauthorized part of the EleutherAI dataset known as the Pile, which includes content from YouTube, Wikipedia, the European Parliament, and even emails from Enron. Major technology companies like Apple and Salesforce have employed the Pile to train their AI models. This use of YouTube transcripts violates YouTube’s terms of service because it involves automated scraping of video content and associated data.
The YouTube creators who were interviewed expressed surprise and anger at the unauthorized use of their content. Some were upset to learn that even deleted videos and content from creators, who earlier had removed their online presence, were included in the Pile dataset. EleutherAI did not comment, but the incident highlights the potential conflict between the EleutherAI’s mission to democratize access for AI technologies versus the rights and interests of content creators and platforms. It is a situation that shows the ethical and legal complexity of AI model training.
The whytry.ai article you just read is a brief synopsis; the original article can be found here: Read the Full Article…