The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions
localized in space and time, resulting in 1.65M action labels with multiple labels per human
occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic
visual actions, rather than composite actions; (2) precise spatio-temporal annotations with
possibly multiple annotations for each person; (3) exhaustive annotation of these atomic
actions over 15-minute video clips; (4) using movies to gather a varied set of action representations.
AVA v2.0 is now available for download, and described in detail in this arXiv paper. AVA v2.0 will be the basis of a challenge at the ActivityNet workshop at CVPR 2018.