BoundingBoxes Dataset Explore Download About

YouTube-BoundingBoxes Dataset

YouTube-BoundingBoxes is a large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations.

The data set consists of approximately 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.

All these video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.

Our goal with the public release of this dataset is to help advance the state of the art of machine learning for video understanding.

10.5 Million
Human Annotations
The data set consists of 10.5 million human annotations on video frames.
5.6 Million
Bounding Boxes
The data set contains 5.6 million tight bounding boxes around tracked objects in video frames.
240,000
Videos
The data set consists of 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.
95%
Label Accuracy
The use of a cascade of increasingly precise human annotators ensures a measured label accuracy above 95% for every class and tight bounding boxes around the tracked objects.
23
Types of Objects
The objects tracked in the video segments belong to 23 different classes.

This dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.

If you have questions about the dataset, its use, or would like to be notified of updates, please subscribe to youtube-bb-users@.

Google Google About Google Privacy Terms Feedback