YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research

Updated Dataset

YouTube-8M Segments was released in June 2019 with segment-level annotations. Human-verified labels on about 237K segments on 1000 classes are collected from the validation set of the YouTube-8M dataset. Each video will again come with time-localized frame-level features so classifier predictions can be made at segment-level granularity.

YouTube-8M was updated in May 2018 to include higher-quality, more topical annotations, and to clean up the annotation vocabulary. A number of low-frequency or low-quality labels and associated videos were removed, resulting in a smaller but higher-quality dataset (5.6M videos, 3862 classes). Additionally, the video IDs in the TensorFlow Record files have been anonymized, and the mapping to the real YouTube IDs will be periodically updated to exclude any videos that have been subsequently deleted (while preserving their anonymized features).

Dataset versions:

Jun 2019 version (current): 230K human-verified segment labels, 1000 classes, 5 segments/video
May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features
Feb 2017 version (deprecated): 7.0M videos, 4716 classes, 3.4 labels/video, 3.2B audio-visual features
Sep 2016 version (deprecated): 8.2M videos, 4800 classes, 1.8 labels/video, 1.9B visual-only features

Download

We offer the YouTube8M dataset for download as TensorFlow Record files. We provide downloader script that fetches the dataset in shards and stores them in the current directory (output of pwd). It can be restarted if the connection drops. In which case, it only downloads shards that haven't been downloaded yet. We also provide html index pages listing all shards, if you'd like to manually download them. There are two versions of the features: frame-level and video-level features. The dataset is made available by Google LLC. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Starter Code

Starter code for the dataset can be found on our GitHub page. In addition to training code, you will also find python scripts for evaluating standard metrics for comparisons between models.

Note that this starter code is tested only against the latest version and minor changes may be required to use on older versions due to data format changes (e.g., video_id vs. id field, feature names and the number of classes for each of the versions, etc.).

The code for feature extraction can be found in the MediaPipe GitHub YouTube8M example. It extracts both audio and visual features from videos in a single graph.

Segment-rated frame-level features dataset (NEW)

Only frame-level features are available for the YouTube-8M Segment dataset. Each example contains the labels and features of a video in tensorflow.SequenceExample format. The labels in the segment dataset has the same label mapping as the YouTube-8M video-level dataset. The features field contains the same format as the previous version of YouTube-8M frame-level features dataset.

context: {
  feature: {
    key  : "id"
    value: {
      bytes_list: {
        value: (Video id)
      }
    }
  }
  feature: {
    key  : "labels" # video-level labels.
      value: {
        int64_list: {
          value: [ 441, 525 ]
        }
      }
    }
    feature: {
      key: "segment_start_times"
      value: {
        int64_list: {
          value: [ 40, 30, 50, 65, 90 ]
        }
      }
    }
    feature: {
      key: "segment_end_times"
      value: {
        int64_list: {
          value: [ 45, 35, 55, 70, 95 ]
        }
      }
    }
    feature: {
      key: "segment_labels"
      value: {
        int64_list: {
          value: [ 525, 525, 525, 525, 525 ]
        }
      }
    }
    feature: {
      key: "segment_scores"
      value: {
        float_list: {
          value: [ 0.0, 0.0, 0.0, 0.0, 1.0 ]
        }
      }
    }
  }
}
feature_lists: {
  # See the frame-level features section.
}

To download the YouTube-8M Segments dataset, please use our python download script. This assumes that you have python and curl installed.

To download the Frame-level dataset using the download script, navigate your terminal to a directory where you would like to download the data. For example:

mkdir -p ~/data/yt8m/frame; cd ~/data/yt8m/frame

Then download the segment-level validation data and test data.

curl data.yt8m.org/download.py | partition=3/frame/validate mirror=us python
curl data.yt8m.org/download.py | partition=3/frame/test mirror=us python

The above uses the us mirror. If you are located in Europe or Asia, please swap the mirror flag us with eu or asia, respectively.

Frame-level features dataset

Frame-level features are stored as tensorflow.SequenceExample protocol buffers. A tensorflow.SequenceExample proto is reproduced here in text format:

context: {
  feature: {
    key  : "id"
    value: {
      bytes_list: {
        value: (Video id)
      }
    }
  }
  feature: {
    key  : "labels"
      value: {
        int64_list: {
          value: [1, 522, 11, 172]  # label list
        }
      }
    }
}

feature_lists: {
  feature_list: {
    key  : "rgb"
    value: {
      feature: {
        bytes_list: {
          value: [1024 8bit quantized features]
        }
      }
      feature: {
        bytes_list: {
          value: [1024 8bit quantized features]
        }
      }
      ... # Repeated for every second, up to 300
  }
  feature_list: {
    key  : "audio"
    value: {
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
    }
    ... # Repeated for every second, up to 300
  }

}

The total size of the frame-level features is 1.53 Terabytes. They are broken into 3844 shards which can be subsampled to reduce the dataset size.

To download the frame-level features, you have the following options:

Manually download all 3844 shards from the frame-level training, frame-level validation, and the frame-level test partitions. You may also find it useful to download a handful of shards (see details below), start developing your code against those shards, and in conjunction kick-off the larger download.
Use our python download script. This assumes that you have python and curl installed.

To download the Frame-level dataset using the download script, navigate your terminal to a directory where you would like to download the data. For example:
mkdir -p ~/data/yt8m/frame; cd ~/data/yt8m/frame
Then download the training and validation data. Note: Make sure you have 1.53TB of free disk space to store the frame-level feature files. Download the entire dataset as follows:
curl data.yt8m.org/download.py | partition=2/frame/train mirror=us python
curl data.yt8m.org/download.py | partition=2/frame/validate mirror=us python
curl data.yt8m.org/download.py | partition=2/frame/test mirror=us python
The above uses the us mirror. If you are located in Europe or Asia, please swap the mirror flag us with eu or asia, respectively.

To download 1/100-th of the training data from the US use:
curl data.yt8m.org/download.py | shard=1,100 partition=2/frame/train mirror=us python

Video-level features dataset

Video-level features are stored as tensorflow.Example protocol buffers. A tensorflow.Example proto is reproduced here in text format:

features: {
  feature: {
    key  : "id"
    value: {
      bytes_list: {
        value: (Video id)
      }
    }
  }
  feature: {
    key  : "labels"
    value: {
      int64_list: {
        value: [1, 522, 11, 172]  # label list
      }
    }
  }
  feature: {
    # Average of all 'rgb' features for the video
    key  : "mean_rgb"
    value: {
      float_list: {
        value: [1024 float features]
      }
    }
  }
  feature: {
    # Average of all 'audio' features for the video
    key  : "mean_audio"
    value: {
      float_list: {
        value: [128 float features]
      }
    }
  }
}

The total size of the video-level features is 31 Gigabytes. They are broken into 3844 shards which can be subsampled to reduce the dataset size. Similar to above, we offer two download options:

Manually download all 3844 shards from the video-level training, video-level validation, and the video-level test partitions. You may also find it useful to download a handful of shards, start developing your code against those shards, and in conjunction kick-off the larger download.
If you are located in Europe or Asia, please replace us in the URL with eu or asia, respectively to speed up the transfer of the files.
Use our python download script. For example:
mkdir -p ~/data/yt8m/video; cd ~/data/yt8m/video

curl data.yt8m.org/download.py | partition=2/video/train mirror=us python
curl data.yt8m.org/download.py | partition=2/video/validate mirror=us python
curl data.yt8m.org/download.py | partition=2/video/test mirror=us python
If you are located in Europe or Asia, please swap the domain prefix us with eu or asia, respectively.

To download 1/100-th of the training data from the US use:
curl data.yt8m.org/download.py | shard=1,100 partition=2/video/train mirror=us python