Who are we?
We are part of the Video Understanding group within the Machine Perception
organization at Google. We work on building computer vision and video understanding
systems at large scales, making it easier to find and discover great video content on
YouTube and the web, and helping personal video collections become useful, delightful, and
entertaining. Our long-term technology mission is to achieve the ability to understand and describe
video at the level of a human expert, purely from pixels and audio samples.
We created this dataset in order to advance computer vision and video understanding at large scale.
The videos were sampled to preserve the very diverse distribution of popular YouTube content, the
annotation vocabulary was carefully constructed, and the features were designed to fit on a single
commodity hard disk for a million-hour video dataset. This makes it possible to download the dataset
on a local machine and train a full-scale model in less than a day on a single GPU! We feel that,
by giving researchers access to such a large labeled video dataset with precomputed features, we can
eliminate storage and computational barriers, and help accelerate research on large-scale video
understanding. We hope this dataset will spur exciting new advancements on video modeling
architectures and representation learning, especially approaches that deal effectively with noisy or
incomplete labels, transfer learning and domain adaptation. Our paper
includes details on how we collected the dataset, as well as experimental results for some
baseline video modeling and domain transfer approaches.
If you have questions about the dataset, or would like to be notified of updates, please subscribe to
Google Group: youtube8m-users
The people who worked extensively to bring you this dataset:
* Former interns with our team who contributed to the dataset creation.