YouTube-BB Dataset | Google Research

YouTube-BoundingBoxes Dataset

YouTube-BoundingBoxes is a large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations.

The data set consists of approximately 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.

All these video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.

Our goal with the public release of this dataset is to help advance the state of the art of machine learning for video understanding.

10.5 Million

Human Annotations

The data set consists of 10.5 million human annotations on video frames.

5.6 Million

Bounding Boxes

The data set contains 5.6 million tight bounding boxes around tracked objects in video frames.

240,000

Videos

The data set consists of 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.

95%

Label Accuracy

The use of a cascade of increasingly precise human annotators ensures a measured label accuracy above 95% for every class and tight bounding boxes around the tracked objects.

23

Types of Objects

The objects tracked in the video segments belong to 23 different classes.

This dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.

If you have questions about the dataset, its use, or would like to be notified of updates, please subscribe to youtube-bb-users@.