Vocaine the Vocoder and Applications in Speech Synthesis
Abstract
Vocoders received renewed attention recently as basic components in speech
synthesis applications such as voice transformation, voice conversion and
statistical parametric speech synthesis. This paper presents a new vocoder
synthesizer, referred to as Vocaine, that features a novel Amplitude
Modulated-Frequency Modulated (AM-FM) speech model, a new way to synthesize
non-stationary sinusoids using quadratic phase splines and a super fast cosine
generator. Extensive evaluations are made against several state-ofthe-art methods
in Copy-Synthesis and Text-To-Speech synthesis experiments. Vocaine matches or
outperforms STRAIGHT in CopySynthesis experiments and outperforms our baseline
real-time optimized Mixed-Excitation vocoder with the same computational cost. We
report that Vocaine considerably improves our statistical TTS synthesizers and that
our new statistical parametric synthesizer [1] matched the quality of our mature
production Unit-Selection system with uncompressed waveforms.
