On Using Nearly-Independent Feature Families for High Precision and Confidence
Venue
Fourth Asian Machine Learning Conference, JMLR workshop and conference proceedings (2012), pp. 269-284
Publication Year
2012
Authors
Omid Madani, Manfred Georg, David Ross
BibTeX
Abstract
Often we require classification at a very high precision level, such as 99%. We
report that when very different sources of evidence such as text, audio, and video
features are available, combining the outputs of base classifiers trained on each
feature type separately, aka late fusion, can substantially increase the recall of
the combination at high precisions, compared to the performance of a single
classifier trained on all the feature types i.e., early fusion, or compared to the
individual base classifiers. We show how the probability of a joint false-positive
mistake can be upper bounded by the product of individual probabilities of
conditional false-positive mistakes, by identifying a simple key criterion that
needs to hold. This provides an explanation for the high precision phenomenon, and
motivates referring to such feature families as (nearly) independent. We assess the
relevant factors for achieving high precision empirically, and explore combination
techniques informed by the analysis. We compare a number of early and late fusion
methods, and observe that classifier combination via late fusion can more than
double the recall at high precision.
