YouTube-BoundingBoxes is a large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations.
The data set consists of approximately 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.
All these video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
Our goal with the public release of this dataset is to help advance the state of the art of machine learning for video understanding.
|
The data set consists of
10.5 million human annotations on video frames.
|
The data set contains
5.6 million tight bounding boxes around tracked objects in
video frames.
|
The data set consists of
380,000 15-20s video segments extracted from 240,000 different
publicly visible YouTube videos, automatically selected to
feature objects in natural settings without editing or
post-processing, with a recording quality often akin to that of
a hand-held cell phone camera.
|
The use of a cascade of increasingly
precise human annotators ensures a measured label accuracy
above 95% for every class and tight bounding boxes around the
tracked objects.
|
The objects tracked in the video segments
belong to 23 different classes.
|
This dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.
If you have questions about the dataset, its use, or would like to be notified of updates, please subscribe to youtube-bb-users@.