Publication Data
Kernel Based Text-Independnent Speaker Verification
Abstract: The goal of a person authentication system is to
authenticate the claimed identity of a user. When this authentication is based on the
voice of the user, without respect of what the user exactly said, the system is called
a text-independent speaker verification system. Speaker verification systems are
increasingly often used to secure personal information, particularly for mobile phone
based applications. Furthermore, text-independent versions of speaker verification
systems are the most used for their simplicity, as they do not require complex speech
recognition modules. The most common approach to this task is based on Gaussian Mixture
Models (GMMs), which do not take into account any temporal information. GMMs have been
intensively used thanks to their good performance, especially with the use of the
Maximum A Posteriori (MAP) adaptation algorithm. This approach is based on the density
estimation of an impostor data distribution, followed by its adaptation to a specific
client data set. Note that the estimation of these densities is not the final goal of
speaker verification systems, which is rather to discriminate the client and impostor
classes; hence discriminative approaches might appear good candidates for this task as
well. As a matter of fact, Support Vector Machine (SVM) based systems have been the
subject of several recent publications in the speaker verification community, in which
they obtain similar to or even better performance than GMMs on several text-independent
speaker verification tasks. In order to use SVMs or any other discriminant approaches
for speaker verification, several modifications from the classical techniques need to
be performed. The purpose of this chapter is to present an overview of discriminant
approaches that have been used successfully for the task of text-independent speaker
verification, to analyze their difference and their similarities with each other and
with classical generative approaches based on GMMs. An open-source version of the C++
source code used to performed all experiments described in this chapter can be found at
http://speaker.abracadoudou.com.
