Hidden Video IDs

To protect the privacy of our uploaders, we do not use YouTube IDs as part of our dataset. Instead, we associate each video with a Randomly-generated ID, which we store under context feature with name "id". Nonetheless, developpers can lookup the external YouTube ID using the Randomly-generated one, as long as the video remains public on YouTube. When a video gets deleted, or made private by its uploader, the lookup URL becomes invalid.

Translating Video IDs

The ID field in the TensorFlow record files is a 4-character string (e.g. ABCD). To get the YouTubeID, you can construct a URI like /AB/ABCD.js (note: first 2 characters are repeated!), and append it to the URL data.yt8m.org/2/j/i. As a real example, the ID nXSc can be converted to a video ID via the URL data.yt8m.org/2/j/i/nX/nXSc.js. The format of the file is JSONP, and should be self-explainatory.
