Video2Text: Learning to Annotate Video Content
Venue
ICDM Workshop on Internet Multimedia Mining (2009)
Publication Year
2009
Authors
Hrishikesh Aradhye, George Toderici, Jay Yagnik
BibTeX
Abstract
This paper discusses a new method for automatic discovery and organization of
descriptive concepts (labels) within large real-world corpora of user-uploaded
multimedia, such as YouTube.com. Conversely, it also provides validation of
existing labels, if any. While training, our method does not assume any explicit
manual annotation other than the weak labels already available in the form of video
title, descrip- tion, and tags. Prior work related to such auto-annotation assumed
that a vocabulary of labels of interest (e.g., indoor, outdoor, city, landscape) is
specified a priori. In contrast, the proposed method begins with an empty
vocabulary. It analyzes audiovisual features of 25 million YouTube.com videos –
nearly 150 years of video data – effectively searching for consistent correlation
between these features and text metadata. It autonomously extends the label
vocabulary as and when it discovers concepts it can reliably identify, eventually
leading to a vocabulary with thousands of labels and growing. We believe that this
work significantly extends the state of the art in multimedia data mining,
discovery, and organization based on the technical merit of the proposed ideas as
well as the enormous scale of the mining exercise in a very challenging,
unconstrained, noisy domain.
