Linear classifiers are nearly optimal when hidden variables have diverse effects
Venue
Machine Learning, vol. 86 (2012), pp. 209-231
Publication Year
2012
Authors
Nader H. Bshouty, Philip M. Long
BibTeX
Abstract
We analyze classification problems in which data is generated by a two-tiered
random process. The class is generated first, then a layer of conditionally
independent hidden variables, and finally the observed variables. For sources like
this, the Bayes-optimal rule for predicting the class given the values of the
observed variables is a two-layer neural network. We show that, if the hidden
variables have non-negligible effects on many observed variables, a linear
classifier approximates the error rate of the Bayes optimal classifier up to lower
order terms. We also show that the hinge loss of a linear classifier is not much
more than the Bayes error rate, which implies that an accurate linear classifier
can be found efficiently.
