Using instantaneous frequency and aperiodicity detection to estimate FO for high-quality speech synthesis

Hideki Kawahara

Yannis Agiomyrgiannakis

Heiga Zen

Proc. ISCA SSW9 (2016), pp. 238-245

Download Google Scholar

Abstract

This paper introduces a general and flexible framework for FO and aperiodicity analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, FO trajectory tracker, and FO refinement and aperiodicity extractor. A preliminary implementation of the proposed framework substantially outperformed (1/5 to 1/10 in terms of RMS FO estimation error) existing FO extractors in tracking ability of temporally varying FO trajectories. The front end aperiodicity detector consists of a complex-valued wavelet analysis filter with a highly selective temporal and spectral envelope. This front end aperiodicity detector uses a new measure that quantifies the deviation from periodicity. The measure is less sensitive to slow FM and AM and closely correlates with the signal to noise ratio. The front end combines instantaneous frequency information over a set of filter outputs using the measure to yield an observation probability map. The second stage generates the initial FO trajectory using this map and signal power information. The final stage uses the deviation measure of each harmonic component and FO adaptive time warping to refine the FO estimate and aperiodicity estimation. The proposed framework is flexible to integrate other sources of instantaneous frequency when they provide relevant information.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Using instantaneous frequency and aperiodicity detection to estimate FO for high-quality speech synthesis

Abstract

Research Areas

Meet the teams driving innovation