Detecting Events and Key Actors in Multi-Person Videos
Venue
Computer Vision and Pattern Recognition (CVPR) (2016)
Publication Year
2016
Authors
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei
BibTeX
Abstract
Multi-person event recognition is a challenging task, often with many people active
in the scene but only a small subset contributing to an actual event. In this
paper, we propose a model which learns to detect events in such videos while
automatically "attending" to the people responsible for the event. Our model does
not use explicit annotations regarding who or where those people are during
training and testing. In particular, we track people in videos and use a recurrent
neural network (RNN) to represent the track features. We learn time-varying
attention weights to combine these features at each time-instant. The attended
features are then processed using another RNN for event detection/classification.
Since most video datasets with multiple people are restricted to a small number of
videos, we also collected a new basketball dataset comprising 257 basketball games
with 14K event annotations corresponding to 11 event classes. Our model outperforms
state-of-the-art methods for both event classification and detection on this new
dataset. Additionally, we show that the attention mechanism is able to consistently
localize the relevant players.
