Web-Scale Multi-Task Feature Selection for Behavioral Targeting
Venue
Proceedings of The 21st ACM International Conference on Information and Knowledge Management (CIKM), ACM (2012) (to appear)
Publication Year
2012
Authors
Amr Ahmed, Mohamed Aly, Abhimanyu Das, Alex Smola, Tasos Anastasakos
BibTeX
Abstract
A typical behavioral targeting system optimizing purchase activities, called
conversions, faces two main challenges: the web-scale amounts of user histories to
process on a daily basis, and the relative sparsity of conversions. In this paper,
we try to address these challenges through feature selection. We formulate a
multi-task (or group) feature-selection problem among a set of related tasks
(sharing a common set of features), namely advertising campaigns. We apply a
group-sparse penalty consisting of a combination of an `1 and `2 penalty and an
associated fast optimization algorithm for distributed parameter estimation. Our
algorithm relies on a variant of the well known Fast Iterative Thresholding
Algorithm (FISTA), a closed-form solution for mixed norm programming and a
distributed subgradient oracle. To eciently handle web-scale user histories, we
present a distributed inference algorithm for the problem that scales to billions
of instances and millions of attributes. We show the superiority of our algorithm
in terms of both sparsity and ROC performance over baseline feature selection
methods (both single-task L1-regularization and multi-task mutual-information
gain).
