Efficient and Accurate Label Propagation on Large Graphs and Label Sets
Venue
Proceedings International Conference on Advances in Multimedia, IARIA (2013)
Publication Year
2013
Authors
Michele Covell, Shumeet Baluja
BibTeX
Abstract
Many web-based application areas must infer label distributions starting from a
small set of sparse, noisy labels. Examples include searching for, recommending,
and advertising against image, audio, and video content. These labeling problems
must handle millions of interconnected entities (users, domains, content segments)
and thousands of competing labels (interests, tags, recommendations, topics).
Previous work has shown that graph-based propagation can be very effective at
finding the best label distribution across nodes, starting from partial information
and a weighted-connection graph. In their work on video recommendations, Baluja et
al. [1] showed high-quality results using Adsorption, a normalized propagation
process. An important step in the original formulation of Adsorption was
re-normalization of the label vectors associated with each node, between every
propagation step. That interleaved normalization forced computation of all label
distributions, in synchrony, in order to allow the normalization to be correctly
determined. Interleaved normalization also prevented use of standard linear-algebra
methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian
elimination. This paper presents a method that replaces the interleaved
normalization with a single pre-normalization, done once before the main
propagation process starts, allowing use of selective label computation (label
slicing) as well as large-matrix-solution methods. As a result, much larger graphs
and label sets can be handled than in the original formulation and more accurate
solutions can be found in fewer propagation steps. We also report results from
using pre-normalized Adsorption in topic labeling for web domains, using label
slicing and BiCGStab.
