The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition
Venue
From Sounds to Music and Emotions: 9th International Symposium, CMMR 2012, London, UK, June 19-22, 2012, Revised Selected Papers, Springer Berlin Heidelberg (2013), pp. 197-213
Publication Year
2013
Authors
Thomas C. Walters, David A. Ross, Richard F. Lyon
BibTeX
Abstract
We present a system for representing the musical content of short pieces of audio
using a novel chroma-based representation known as the ‘intervalgram’, which is a
summary of the local pattern of musical intervals in a segment of music. The
intervalgram is based on a chroma representation derived from the temporal profile
of the stabilized auditory image [10] and is made locally pitch invariant by means
of a ‘soft’ pitch transposition to a local reference. Intervalgrams are generated
for a piece of music using multiple overlapping windows. These sets of
intervalgrams are used as the basis of a system for detection of identical melodic
and harmonic progressions in a database of music. Using a dynamic-programming
approach for comparisons between a reference and the song database, performance is
evaluated on the ‘covers80’ dataset [4]. A first test of an intervalgram-based
system on this dataset yields a precision at top-1 of 53.8%, with an ROC curve that
shows very high precision up to moderate recall, suggesting that the intervalgram
is adept at identifying the easier-to-match cover songs in the dataset with high
robustness. The intervalgram is designed to support locality-sensitive hashing,
such that an index lookup from each single intervalgram feature has a moderate
probability of retrieving a match, with few false matches. With this indexing
approach, a large reference database can be quickly pruned before more detailed
matching, as in previous content-identification systems.
