8M  Dataset Explore Download About

YouTube-8M Dataset

YouTube-8M is a large-scale labeled video dataset that consists of 8 million YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities. It also comes with precomputed state-of-the-art vision features from billions of frames, which fit on a single hard disk. This makes it possible to train video models from hundreds of thousands of video hours in less than a day on 1 GPU!

Our goal is to accelerate research on large-scale video understanding, representation learning, noisy data modeling, transfer learning, and domain adaptation approaches for video. More details about the dataset and initial experiments can be found in our technical report.

8 Million
Video URLs
0.5 Million
Hours of Video
1.9 Billion
Frame Features
Avg. Labels / Video

Dataset Vocabulary

The (multiple) labels per video are Knowledge Graph entities, organized into 24 top-level verticals. Each entity represents a semantic topic that is visually recognizable in video, and the video labels reflect the main topics of each video.

You can download a CSV file of our vocabulary. The line number of the label corresponds to its index in the dataset files, with the first label corresponding to index 0. The CSV file contains the following columns:

NumTrainVideos, KnowledgeGraphID, LabelName, FirstVertical, SecondVertical, ThirdVertical

The entity frequencies are plotted below in log-log scale, which shows a Zipf-like distribution:

In addition, we show histograms with the number of entities and number of training videos in each top-level vertical:


This dataset is brought to you from the Video Understanding group at Google Research. More about us.
If you want to stay up-to-date about this dataset, please subscribe to our Google Group: youtube8m-users. The group should be used for discussions about the dataset and the starter code.

Explore / Download

Ready to explore or start using YouTube-8M?

Google Google About Google Privacy Terms Feedback