The AVA dataset contains 192 videos split into 154 training and 38 test videos. Each video has 15 minutes annotated in 3 second intervals, resulting in 300 annotated segments. These annotations are specified by two CSV files: ava_train_v1.0.csv and ava_test_v1.0.csv.

Each row contains an annotation for one person performing an action in an interval, where that annotation is associated with the middle frame. Different persons and multiple action labels are described in separate rows.

The format of a row is the following: video_id, middle_frame_timestamp, person_box, action_id

  • video_id: YouTube identifier
  • middle_frame_timestamp: in seconds from the start of the YouTube.
  • person_box: top-left (x1, y1) and bottom-right (x2,y2) normalized with respect to frame size, where (0.0, 0.0) corresponds to the top left, and (1.0, 1.0) corresponds to bottom right.
  • action_id: identifier of an action class, see ava_action_list_v1.0.pbtxt
The dataset is made available by Google Inc. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Note: the AVA v2.0 release is on the way. Please refer to our arXiv paper update for details of changes over v1.0.
