Dataset Explore Download About

The 3rd Workshop on YouTube-8M Large-Scale Video Understanding

October 28th, 2019, Seoul, Korea (ICCV'19)

â–˛ Explore Korean Food on YouTube-8M!


Many recent breakthroughs in machine learning and machine perception have come from the availability of large labeled datasets, such as ImageNet, which has millions of images labeled with thousands of classes, and has significantly accelerated research in image understanding. Google announced the YouTube-8M dataset in 2016, which spans millions of videos labeled with thousands of classes, with the hope that it would spur similar innovation and advancement in video understanding. YouTube-8M represents a cross-section of our society, and was designed with scale and diversity in mind so that lessons we learn on this dataset can transfer to all areas of our lives, from learning, to communication, to entertainment. It covers over 20 broad domains of video content, including entertainment, sports, commerce, hobbies, science, news, jobs & education, health.

Continuing from the last year's challenge and workshop, we are excited to announce the 3rd Workshop on YouTube-8M Large-Scale Video Understanding, to be held on October 28, 2019, at the International Conference on Computer Vision (ICCV 2019) in Seoul, Korea. We invite researchers to participate in this large-scale video classification challenge and to report their results at the workshop, as well as to submit papers describing research, experiments, or applications based on YouTube-8M. The classification challenge will be hosted as a kaggle.com competition. We will feature $2,500 travel award for the 10 top-performing teams (details here).


Time Content Presenter
9:00 - 9:05 Opening Remarks Paul Natsev
9:05 - 9:30 Overview of 2019 YouTube-8M Dataset & Challenge Challenge Orgnizers
Session 1
9:30 - 10:00 Invited Talk 1: Title TBD Jitendra Malik
10:00 - 10:30 Invited Talk 2: Title TBD Jean-Baptiste Alayrac
10:30 - 10:45 Coffee Break
Session 2
10:45 - 11:00 Organizers’ Tech Talk: Approaches and Analysis Organizers
11:00 - 12:30 Oral Session 1
6 presentations
12:30 - 2:00 Lunch on your own
Session 3
2:00 - 2:30 Invited Talk 3: Title TBD Cees Snoek
2:30 - 3:00 Invited Talk 4: Title TBD Dima Damen
3:00 - 4:00 Oral Session 2
4 presentations
4:00 - 4:15 Award Ceremony Paul Natsev
4:15 - 4:30 Coffee Break
Session 4
4:30 - 6:00 Poster Session All accepted papers

Invited Talks

Invited Talk 1
Jitendra Malik University of California at Berkeley

Invited Talk 2
Jean-Baptiste Alayrac DeepMind

Invited Talk 3
Cees Snoek University of Amsterdam

Invited Talk 4
Dima Damen University of Bristol

Call for Participation

We are soliciting participation for two different tracks as last year:

Classification Challenge Track

This track will be organized as a Kaggle competition for large-scale video classification based on the YouTube-8M dataset. Researchers are invited to participate in the classification challenge by training a model on the public YouTube-8M training and validation sets and submitting video classification results on a blind test set.

In this year, we update the dataset to include segment-level human-labeled ground truth for a subset of videos in the dataset. The granularity of the labeling is therefore increased from one per video, to one per 5 seconds. Each video will again come with time-localized frame-level features so classifier predictions can be made at segment-level granularity. Unlike the previous editions of this challenge, the competition task will focus on temporal localization within a video. Segment/frame-level annotation or temporal localization is an important challenge in video understanding with various applications, such as searching within a video or discovering interesting action moments. In practice, segment-level annotation data is very hard and expensive to collect at large scale, making this problem very difficult. Thus, the main focus of this year's challenge is how to leverage noisy video-level labels and a small subset of segment-level calibration set jointly in order to better annotate and temporally localize concepts of interest. We will evaluate submissions based on human-labeled data for the first time. There is no model size restriction this year, although we encourage participants to train lighter single model instead of heavy ensembles. Open-source TensorFlow code, implementing a few baseline segment-level classification models for YouTube-8M, along with training and evaluation scripts, is available at GitHub. For details on getting started with local or cloud-based training, please see our README and the getting started guide on Kaggle. Results will be scored by a Kaggle evaluation server and published on a public leaderboard, updated live for all submissions (scored on a portion of the test set), along with a final (private) leaderboard, published after the competition is closed (scored on the rest of the test set). Top-ranking submissions in the challenge leaderboard will be invited to the workshop to present their method. Please see details on the Kaggle competition page.

We encourage participants to explore the following topics (non-exhaustive list) and to submit papers to this workshop discussing their approaches and result analysis (publication is also a requirement for prize eligibility on the Kaggle competition):

  • large-scale multi-label video classification / annotation
  • temporal / sequence modeling and pooling approaches for video
  • temporal attention modeling mechanisms
  • video representation learning (e.g., classification performance vs. video descriptor size)
  • multi-modal (audio-visual) modeling and fusion approaches
  • learning from noisy / incomplete ground-truth labels
  • score calibration and ranking across classes and videos
  • multiple-instance learning
  • transfer learning, domain adaptation, generalization (across the 24 top-level verticals)
  • scale: performance vs. training data & compute quantity (#videos, #frames, #CPUs, etc.)

General Paper Track

Researchers are invited to submit any papers involving research, experimentation, or applications on the YouTube-8M dataset. The paper need not to tackle this year's task (segment-level video annotation) for this track. We welcome submissions with other tasks with the dataset, including our previous challenge topic (video-level annotation). Paper submissions will be reviewed by the workshop organizers and accepted papers will be invited for oral or poster presentations at the workshop.

We encourage participants to explore any relevant topics of interest using YouTube-8M dataset, including but not limited to:

  • All of the topics listed above (with or without participation in the Kaggle challenge)
  • Large-scale video recommendations, search, and discovery
  • Joining YouTube-8M with other publicly available resources / datasets (e.g., exploring YouTube video metadata, testing public models / features on this dataset, etc.)
  • Dataset visualization / browsing interfaces
  • Label augmentation and cleanup, active learning
  • Open-source contributions to the community targeting this dataset

Submission to this track does not require participation in the challenge task, but must be related to the YouTube-8M dataset. We welcome new applications that we didn't think of! Paper submissions are expected to have 8 to 12 pages (no strict page limit) in the ICCV formatting style. Demo paper submissions are also welcome.

This year, the submission system does not distinguish the two tracks. If you are submitting to the general paper track, please indicate "N/A" in the Kaggle team name section in the submission questionaire. For submissions to the classification challenge track, this field is required.


Each of the top 10 ranked teams (on the final private leaderboard) will receive $2,500 per team as a travel award to attend the ICCV 2019 Conference. Prize eligibility requires adherence to the Competition Rules. Winners must submit and present a paper describing their approach to the workshop to be eligible for this award.

Submission Instructions


All submissions will be handled electronically, through our CMT submission site. Papers are limited to 8 pages, including figures and tables, in the ICCV style. Additional pages containing only cited references are allowed. Please refer to the files in the Author Guidelines page at the ICCV 2019 website for formatting instructions.

Review Process

Submitted papers will be reviewed by the organizing committee members, and a subset will be selected for oral or poster presentation. Submissions will be evaluated in terms of potential impact (e.g. performance on the classification challenge), technical depth & scalability, novelty, and presentation.

Blind Submission / Dual Submission / Page Limit Policies

We do not require blind submissions---author names and affiliations may be shown. We do not restrict submissions of relevant work that is under review or will be published elsewhere. Previously published work (except for on previous YouTube-8M workshops) is also acceptable as long as it is retargeted towards YouTube-8M. Papers are limited to 8 pages, including figures and tables, but excluding references. The accepted papers will be linked on the workshop website and will appear in the ICCV proceedings through CVF open access archive.

How to Submit

  1. Create an account at our CMT submission site. If you do not receive a confirmation email, you may reset password before login.
  2. Submit your paper through the CMT site. You will have to input your Kaggle team name if you are submitting your approach used in the Kaggle competition.
  3. Submission deadline for this form is September 20, 2019, 11:59 PM (UTC/GMT).

Important Dates

According to the deadline extension of the challenge, we run two rounds of paper submission schedule. The first round is same as before; for those who want to confirm acceptance before scheduling the trip to ICCV, we encourage you to submit a paper based on your intermediate result by 9/20. We will notify the result by 9/24. Paper submission is open until 10/18, one week after the competition closing date. The top 10 teams must submit a paper by this due date to be eligible for the prize. We strongly encourage the winners to present in person, but if it is hard (for instance, due to visa requirement), it is possible to arrange remote presentation either live or through a video recording. Camera-ready deadline for all accepted papers is 10/25, through the CMT submission site. Optionally, authors of the accepted papers on the 1st round may choose to officially publish the paper by submitting the camera-ready by 9/27. We will notify with detailed instructions later.

Paper submission deadline (1st round) September 20, 2019 (11:59 PM UTC/GMT)
Paper Acceptance Notification (1st round) September 24, 2019
Paper camera-ready deadline (1st round) September 27, 2019
Challenge submission deadline October 11, 2019
Paper submission deadline (2nd round) & Winners' obligations deadline October 18, 2019 (11:59 PM UTC/GMT)
Paper Acceptance Notification (2nd round) & Challenge Winners Confirmation October 22, 2019
Paper camera-ready deadline (2nd round) October 25, 2019
Workshop date (at ICCV'19) October 28, 2019


General Chairs

Apostol (Paul) Natsev

Cordelia Schmid

Rahul Sukthankar

Program Chairs

Joonseok Lee

George Toderici

Challenge Organizers

Ke Chen

Julia Elliott

Nisarg Kothari

Hanhan Li

Joe Yue-Hei Ng

Sobhan Naderi Parizi

Walter Reade

David Ross

Javier Snaider

Balakrishnan Varadarajan

Sudheendra Vijayanarasimhan

Yexin Wang

Zheng Xu


If you have any questions, please email us at yt8m-challenge@google.com or use the YouTube-8M Users Group.
Google Google About Google Privacy Terms Feedback