YouTube-BoundingBoxes is a large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations.
The data set consists of approximately 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.
All these video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
Our goal with the public release of this dataset is to help advance the state of the art of machine learning for video understanding.
The data set consists of
10.5 million human annotations on video frames.
|
The data set contains
5.6 million tight bounding boxes around tracked objects in
video frames.
|
The data set consists of
380,000 15-20s video segments extracted from 240,000 different
publicly visible YouTube videos, automatically selected to
feature objects in natural settings without editing or
post-processing, with a recording quality often akin to that of
a hand-held cell phone camera.
|
The use of a cascade of increasingly
precise human annotators ensures a measured label accuracy
above 95% for every class and tight bounding boxes around the
tracked objects.
|
The objects tracked in the video segments
belong to 23 different classes.
|
This dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.
If you have questions about the dataset, its use, or would like to be notified of updates, please subscribe to youtube-bb-users@.