Dataset Explore Download About

Hidden Video IDs

To protect the privacy of our uploaders, we do not use YouTube IDs as part of our dataset. Instead, we associate each video with a Randomly-generated ID, which we store under context feature with name "id". Nonetheless, developpers can lookup the external YouTube ID using the Randomly-generated one, as long as the video remains public on YouTube. When a video gets deleted, or made private by its uploader, the lookup URL becomes invalid.

Translating Video IDs

The ID field in the TensorFlow record files is a 4-character string (e.g. ABCD). To get the YouTubeID, you can construct a URI like /AB/ABCD.js (note: first 2 characters are repeated!), and append it to the URL data.yt8m.org/2/j/i. As a real example, the ID nXSc can be converted to a video ID via the URL data.yt8m.org/2/j/i/nX/nXSc.js. The format of the file is JSONP, and should be self-explainatory.
Google Google About Google Privacy Terms Feedback