THE MATCHING-MINIMIZATION ALGORITHM, THE INCA ALGORITHM AND A MATHEMATICAL FRAMEWORK FOR VOICE CONVERSION WITH UNALIGNED CORPORA.
Abstract
This paper presents a mathematical framework that is suitable for voice conversion
and adaptation in speech processing. Voice conversion is formulated as a search for
the optimal correspondances between a set of source-speaker spectra and a set of
target-speaker spectra under a transform that compensates speaker differences. It
is possible to simultaneously recover a bi-directional mapping between two sets of
vectors that is a parametric mapping (a transform) in one direction and a
non-parametric mapping (correspondences) in the reverse direction. An algorithm
referred to as Matching-Minimization (MM) is formally derived with proven
convergence and an optimal closed-form solution for each step. The algorithm is
closely related to the asymmetric-1 variant of the well-known INCA algorithm [1]
for which we also provide a proof within the same framework. The differences
between MM and INCA are delineated both theoretically and experimentally. MM
outperforms INCA in all scenarios. Like INCA, MM does not require parallel corpora.
Unlike INCA, MM is suitable when only a few adaptation data are available.
