logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

See Which Films & TV Series Were Used For Generative AI Training

The database contains more than 53,000 films and over 85,000 TV show episodes.

By now, it's hardly a secret that major AI developers train their models on whatever data they can get their hands on, with big players like OpenAI even openly admitting that their house-of-cards empires would not be possible without copyrighted content, yet at the same time, it is still virtually impossible to determine exactly which specific pieces of content were used as learning materials. Thanks to The Atlantic's Alex Reisner, however, not everything is doom and gloom, and at the very least we now have insight into which Hollywood films and TV shows were scraped by big tech and used to develop AI chatbots and text-to-image generators.

AMC

A few days ago, Reisner published a comprehensive piece revealing that he had gained access to research papers and a dataset proving that many AI systems have been trained on the work of TV and film writers. According to his report, the dataset contains over 53,000 films and more than 85,000 TV show episodes, including, but certainly not limited to, every film nominated for Best Picture from 1950 to 2016, every episode of Breaking Bad, The Wire, and The Sopranos, over 600 episodes of The Simpsons, 170 episodes of Seinfeld, and 45 episodes of Twin Peaks.

As Reisner outlined, the database was created using subtitles from OpenSubtitles.org, a site that currently hosts more than 9 million subtitle files in over 100 languages and dialects. These subtitles were used by companies like Apple, Anthropic, Meta, NVIDIA, Salesforce, Bloomberg, and others to train AI systems such as the Claude chatbot, the NeMo Megatron LLMs, AI models for iPhones, and approximately 140 other models.

Thankfully, Reisner also provided access to the dataset through a simple tool that lets you check if your favorite show was used as food for AI algorithms by simply entering the name of a movie or TV series and clicking a single button. The tool is available via the original report on The Atlantic.

One of the series featured in the database is the beloved Gravity Falls, a fact that didn't escape the attention of Alex Hirsch, the show's creator, who suggested adding an absolutely brilliant "heartwarming moral" to the end of each episode:

Moreover, Hirsch has provided perhaps the most spot-on explanation for why, even now in November 2024, data scraping is still allowed and publications still have to dance around the issue and cannot describe the act of using copyrighted work for AI training as "stealing" and "theft" without risking defamation lawsuits from multi-billion-dollar giants:

Speaking of generative artificial intelligence, less than a week ago, Netflix found itself under fire for extending a poster for Arcane Season 2 with AI, something Riot Games' brand lead described as "disrespectful to the incredible artists who worked on the show."

And prior to that, OpenAI got lambasted online for describing Digital Artists' use of anti-scraping software like Glaze and Nightshade as "abuse," indirectly suggesting that the tools do indeed work.

Don't forget to join our 80 Level Talent platform and our new Discord server, follow us on InstagramTwitterLinkedInTelegramTikTok, and Threads, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 1

  • Anonymous user

    Oh my stars and garters!  At least I can rest easy knowing all the writers and directors I admire have never studied, learned from or been inspired by the works of other creators in their fields like dirty plagiarists.

    0

    Anonymous user

    ·21 days ago·

You might also like

We need your consent

We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more