Large-Scale Speaker Identification
Venue
Proc. ICASSP, IEEE (2014)
Publication Year
2014
Authors
Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno
BibTeX
Abstract
Speaker identification is one of the main tasks in speech processing. In addition
to identification accuracy, large-scale applications of speaker identification give
rise to another challenge: fast search in the database of speakers. In this paper,
we propose a system based on i-vectors, a current approach for speaker
identification, and locality sensitive hashing, an algorithm for fast
nearest-neighbor search in high dimensions. The connection between the two
techniques is the cosine distance: one the one hand, we use the cosine distance to
compare i-vectors, on the other hand, locality sensitive hashing allows us to
quickly approximate the cosine distance in our retrieval procedure. We evaluate our
approach on a realistic data set from YouTube with about 1000 speakers. The results
show that our algorithm is approximately one to two orders of magnitude faster than
a linear search while maintaining the identification accuracy of an i-vector-based
system.
