Jump to Content

Auditory Sparse Coding

Steven R. Ness
Thomas Walters
Music Data Mining, CRC Press/Chapman Hall (2011)

Abstract

The concept of sparsity has attracted considerable interest in the field of machine learning in the past few years. Sparse feature vectors contain mostly values of zero and one or a few non-zero values. Although these feature vectors can be classified by traditional machine learning algorithms, such as SVM, there are various recently-developed algorithms that explicitly take advantage of the sparse nature of the data, leading to massive speedups in time, as well as improved performance. Some fields that have benefited from the use of sparse algorithms are finance, bioinformatics, text mining, and image classification. Because of their speed, these algorithms perform well on very large collections of data; large collections are becoming increasingly relevant given the huge amounts of data collected and warehoused by Internet businesses. We discuss the application of sparse feature vectors in the field of audio analysis, and specifically their use in conjunction with preprocessing systems that model the human auditory system. We present results that demonstrate the applicability of the combination of auditory-based processing and sparse coding to content-based audio analysis tasks: a search task in which ranked lists of sound effects are retrieved from text queries, and a music information retrieval (MIR) task dealing with the classification of music into genres.

Research Areas