# A theory of learning from different domains

### Venue

Machine Learning, vol. 79 (2010), pp. 151-175

### Publication Year

2010

### Authors

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Vaughan

### BibTeX

## Abstract

Discriminative learning methods for classification perform well when training and
test data are drawn from the same distribution. Often, however, we have plentiful
labeled training data from a source domain but wish to learn a classifier which
performs well on a target domain with a different distribution and little or no
labeled training data. In this work we investigate two questions. First, under what
conditions can a classifier trained from source data be expected to perform well on
target data? Second, given a small amount of labeled target data, how should we
combine it during training with the large amount of labeled source data to achieve
the lowest target error at test time? We address the first question by bounding a
classifier's target error in terms of its source error and the divergence between
the two domains. We give a classifier-induced divergence measure that can be
estimated from finite, unlabeled samples from the domains. Under the assumption
that there exists some hypothesis that performs well in both domains, we show that
this quantity together with the empirical source error characterize the target
error of a source-trained classifier. We answer the second question by bounding the
target error of a model which minimizes a convex combination of the empirical
source and target errors. Previous theoretical work has considered minimizing just
the source error, just the target error, or weighting instances from the two
domains equally. We show how to choose the optimal combination of source and target
error as a function of the divergence, the sample sizes of both domains, and the
complexity of the hypothesis class. The resulting bound generalizes the previously
studied cases and is always at least as tight as a bound which considers minimizing
only the target error or an equal weighting of source and target errors.