Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

Mortaza Doulaty

Richard Rose

Olivier Siohan

Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)

Download Google Scholar

Abstract

Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the training data. To address these mismatched scenarios, multi-style training (MTR) has been used to perturb utterances in an existing uncorrupted and potentially mismatched training speech corpus to better match target domain utterances. This paper addresses the problem of determining the distribution of perturbation levels for a given set of perturbation types that best matches the target speech utterances. An approach is presented that, given a small set of utterances from a target domain, automatically identifies an empirical distribution of perturbation levels that can be applied to utterances in an existing training set.
Distributions are estimated for perturbation types that include acoustic background environments, reverberant room configurations, and speaker related variation like frequency and temporal warping.
The end goal is for the resulting perturbed training set to characterize the variability in the target domain and thereby optimize ASR performance. An experimental study is performed to evaluate the impact of this approach on ASR performance when the target utterances are taken from a simulated far-field acoustic environment.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

Abstract

Research Areas

Learn more about how we conduct our research