On the necessity of irrelevant variables
Venue
ICML (2011)
Publication Year
2011
Authors
David P. Helmbold, Philip M. Long
BibTeX
Abstract
This work explores the effects of relevant and irrelevant boolean variables on the
accuracy of classifiers. The analysis uses the assumption that the variables are
conditionally independent given the class, and focuses on a natural family of
learning algorithms for such sources when the relevant variables have a small
advantage over random guessing. The main result is that algorithms relying
predominately on irrelevant variables have error probabilities that quickly go to 0
in situations where algorithms that limit the use of irrelevant variables have
errors bounded below by a positive constant. We also show that accurate learning is
possible even when there are so few examples that one cannot determine with high
confidence whether or not any individual variable is relevant.
