Publication Data
Linear classifiers are nearly optimal when hidden variables have diverse effects
Abstract: We analyze classification problems in which data is
generated by a two-tiered random process. The class is generated first, then a layer of
conditionally independent hidden variables, and finally the observed variables. For
sources like this, the Bayes-optimal rule for predicting the class given the values of
the observed variables is a two-layer neural network. We show that, if the hidden
variables have non-negligible effects on many observed variables, a linear classifier
approximates the error rate of the Bayes optimal classifier up to lower order terms. We
also show that the hinge loss of a linear classifier is not much more than the Bayes
error rate, which implies that an accurate linear classifier can be found efficiently.
