Sparse coding of auditory features for machine hearing in interference
Venue
Proc. ICASSP, IEEE (2011)
Publication Year
2011
Authors
Richard F. Lyon, Gal Chechik, Jay Ponte
BibTeX
Abstract
A key problem in using the output of an auditory model as the input to a
machine-learning system in a machine-hearing application is to find a good
feature-extraction layer. For systems such as PAMIR (passive-aggressive model for
image retrieval) that work well with a large sparse feature vector, a conversion
from auditory images to sparse features is needed. For audio-file ranking and
retrieval from text queries, based on stabilized auditory images, we took a
multi-scale approach, using vector quantization to choose one sparse feature in
each of many overlapping regions of different scales, with the hope that in some
regions the features for a sound would be stable even when other interfering sounds
were present and affecting other regions. We recently extended our testing of this
approach using sound mixtures, and found that the sparse-coded auditory-image
features degrade less in interference than vector-quantized MFCC sparse features
do. This initial success suggests that our hope of robustness in interference may
indeed be realizable, via the general idea of sparse features that are localized in
a domain where signal components tend to be localized or stable.
