Web-derived Pronunciations
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292
Publication Year
2009
Authors
Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski
BibTeX
Abstract
Pronunciation information is available in large quantities on the Web, in the form
of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate
pronunciations from Web pages and associating them with orthographic words,
filtering out poorly extracted pronunciations, normalizing IPA pronunciations to
better conform to a common transcription standard, and generating phonemic from
ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using
web-derived vs. Pronlex pronunciations.
