Hidden Video IDs
To protect the privacy of our uploaders, we do not use YouTube IDs as part of our dataset.
Instead, we associate each video with a Randomly-generated ID, which we store under
context feature with name "id". Nonetheless, developpers can lookup the external
YouTube ID using the Randomly-generated one, as long as the video remains public on YouTube.
When a video gets deleted, or made private by its uploader, the lookup URL becomes invalid.
Translating Video IDs
The ID field in the TensorFlow record files is a 4-character string (e.g.
ABCD). To get the YouTubeID, you can construct a URI like
/AB/ABCD.js (note: first 2 characters are repeated!), and
append it to the URL
data.yt8m.org/2/j/i. As a real example, the ID
nXSc can be converted to a video ID via the URL
data.yt8m.org/2/j/i/nX/nXSc.js.
The format of the file is JSONP, and should be self-explainatory.