End-to-End Text-Dependent Speaker Verification
Venue
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Publication Year
2016
Authors
Georg Heigold, Ignacio Moreno, Samy Bengio, Noam M. Shazeer
BibTeX
Abstract
In this paper we present a data-driven, integrated approach to speaker
verification, which maps a test utterance and a few reference utterances directly
to a single score for verification and jointly optimizes the system’s components
using the same evaluation protocol and metric as at test time. Such an approach
will result in simple and efficient systems, requiring little domain-specific
knowledge and making few model assumptions. We implement the idea by formulating
the problem as a single neural network architecture, including the estimation of a
speaker model on only a few utterances, and evaluate it on our internal ”Ok Google”
benchmark for text-dependent speaker verification. The proposed approach appears to
be very effective for big data applications like ours that require highly accurate,
easy-to-maintain systems with a small footprint.
