Restricted Transfer learning for Text Categorization
Venue
NIPS Workshop (2013) (to appear)
Publication Year
2013
Authors
Rajhans Samdani, Gideon Mann
BibTeX
Abstract
In practice, machine learning systems deal with multiple datasets over time. When
the feature spaces between these datasets overlap, it is possible to transfer
information from one task to another. Typically in transfer learning, all labeled
data from a source task is saved to be applied to a new target task thereby raising
concerns of privacy, memory and scaling. To ameliorate such concerns, we present a
semi-supervised algorithm for text categorization that transfers information across
tasks without storing the data of the source task. In particular, our technique
learns a sparse low-dimensional projection from unlabeled and the source task data.
In particular, our technique learns low-dimensional sparse word clusters-based
features from the source task data and a massive amount of additional unlabeled
data. Our algorithm is efficient, highly parallelizable, and outperforms competitive
baselines by up to 9% on several difficult benchmark text categorization tasks.
