AVA: A Video Dataset of Atomic Visual Action

AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the video clips has been exhaustively annotated by human annotators, and together they represent a rich variety of scenes, recording conditions, and expressions of human activity.

We provide the following annotations.

AVA-Kinetics Dataset

AVA-Kinetics, our latest release, is a crossover between the AVA Actions and Kinetics datasets. In order to provide localized action labels on a wider variety of visual scenes, we've provided AVA action labels on videos from Kinetics-700, nearly doubling the number of total annotations, and increasing the number of unique videos by over 500x. We hope this will expand the generalizability of localized action models, and open the door to new approaches in multi-task learning.

AVA-Kinetics is described in detail in the arXiv paper.

AVA-Kinetics is now available for download. It was the basis of a challenge in partnership with the ActivityNet workshops at CVPR 2020-2022.

AVA Actions Dataset

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute movie clips, where actions are localized in space and time, resulting in 1.62M action labels with multiple labels per human occurring frequently. A detailed description of our contributions with this dataset can be found in our accompanying CVPR '18 paper.

AVA v2.2 is now available for download. It was the basis of challenges in partnership with the ActivityNet workshop at CVPR 2018 and 2019.

AVA Spoken Activity Datasets

AVA ActiveSpeaker: associates speaking activity with a visible face, on the AVA v1.0 videos, resulting in 3.65 million frames labeled across ~39K face tracks. A detailed description of this dataset is in our arXiv paper. The labels are available for download here.

AVA Speech densely annotates audio-based speech activity in AVA v1.0 videos, and explicitly labels 3 background noise conditions, resulting in ~46K labeled segments spanning 45 hours of data. A detailed description of this dataset is in our Interspeech '18 paper. The labels are available for download here.

Ready to explore or start using AVA?

For announcements and details on upcomming challenges, please sign up to the Google Group: ava-dataset-users.