Sound Ranking Using Auditory Sparse-Code Representations
Venue
ICML 2009 Workshop on Sparse Method for Music Audio
Publication Year
2009
Authors
Martin Rehn, Richard F. Lyon, Samy Bengio, Thomas C. Walters, Gal Chechik
BibTeX
Abstract
The task of ranking sounds from text queries is a good test application for
machine-hearing techniques, and particularly for comparison and evaluation of
alternative sound representations in a large-scale setting. We have adapted a
machine-vision system, ``passive-aggressive model for image retrieval'' (PAMIR),
which efficiently learns, using a ranking-based cost function, a linear mapping
from a very large sparse feature space to a large query-term space. Using this
system allows us to focus on comparison of different auditory front ends and
different ways of extracting sparse features from high-dimensional auditory images.
In addition to two main auditory-image models, we also include and compare a family
of more conventional MFCC front ends. The experimental results show a significant
advantage for the auditory models over vector-quantized MFCCs. The two auditory
models tested use the adaptive pole-zero filter cascade (PZFC) auditory filterbank
and sparse-code feature extraction from stabilized auditory images via multiple
vector quantizers. The models differ in their implementation of the strobed
temporal integration used to generate the stabilized image. Using ranking
precision-at-top-k performance measures, the best results are about 70% top-1
precision and 35% average precision, using a test corpus of thousands of sound
files and a query vocabulary of hundreds of words.
