Recursive Sparse Spatiotemporal Coding
Venue
Proceedings of the Fifth IEEE International Workshop on Multimedia Information Processing and Retrieval, IEEE Computer Society (2009)
Publication Year
2009
Authors
Thomas Dean, Greg Corrado, Richard Washington
BibTeX
Abstract
We present a new approach to learning sparse, spatiotemporal codes in which the
number of basis vectors, their orientations, velocities and the size of their
receptive fields change over the duration of unsupervised training. The algorithm
starts with a relatively small, initial basis with minimal temporal extent. This
initial basis is obtained through conventional sparse coding techniques and is
expanded over time by recursively constructing a new basis consisting of basis
vectors with larger temporal extent that proportionally conserve regions of
previously trained weights. These proportionally conserved weights are combined
with the result of adjusting newly added weights to represent a greater range of
primitive motion features. The size of the current basis is determined
probabilistically by sampling from existing basis vectors according to their
activation on the training set. The resulting algorithm produces bases consisting
of filters that are bandpass, spatially oriented and temporally diverse in terms of
their transformations and velocities. The basic methodology borrows inspiration
from the layer-by-layer learning of multiple-layer restricted Boltzmann machines
developed by Geoff Hinton and his students. Indeed, we can learn multiple-layer
sparse codes by training a stack of denoising autoencoders, but we have had greater
success using L1 regularized regression in a variation on Olshausen and Field's
original SPARSENET. To accelerate learning and focus attention, we apply a
space-time interest-point operator that selects for periodic motion. This
attentional mechanism enables us to efficiently compute and compactly represent a
broad range of interesting motion. We demonstrate the utility of our approach by
using it to recognize human activity in video. Our algorithm meets or exceeds the
performance of state-of-the-art activity-recognition methods.
