Kernel Based Text-Independnent Speaker Verification
Venue
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, Wiley (2009)
Publication Year
2009
Authors
Johnny Mariethoz, Yves Grandvalet, Samy Bengio
BibTeX
Abstract
The goal of a person authentication system is to authenticate the claimed identity
of a user. When this authentication is based on the voice of the user, without
respect of what the user exactly said, the system is called a text-independent
speaker verification system. Speaker verification systems are increasingly often
used to secure personal information, particularly for mobile phone based
applications. Furthermore, text-independent versions of speaker verification
systems are the most used for their simplicity, as they do not require complex
speech recognition modules. The most common approach to this task is based on
Gaussian Mixture Models (GMMs), which do not take into account any temporal
information. GMMs have been intensively used thanks to their good performance,
especially with the use of the Maximum A Posteriori (MAP) adaptation algorithm.
This approach is based on the density estimation of an impostor data distribution,
followed by its adaptation to a specific client data set. Note that the estimation
of these densities is not the final goal of speaker verification systems, which is
rather to discriminate the client and impostor classes; hence discriminative
approaches might appear good candidates for this task as well. As a matter of fact,
Support Vector Machine (SVM) based systems have been the subject of several recent
publications in the speaker verification community, in which they obtain similar to
or even better performance than GMMs on several text-independent speaker
verification tasks. In order to use SVMs or any other discriminant approaches for
speaker verification, several modifications from the classical techniques need to
be performed. The purpose of this chapter is to present an overview of discriminant
approaches that have been used successfully for the task of text-independent
speaker verification, to analyze their difference and their similarities with each
other and with classical generative approaches based on GMMs. An open-source
version of the C++ source code used to performed all experiments described in this
chapter can be found at http://speaker.abracadoudou.com.
