Abstract: Pronunciation information is available in large quantities
on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for
extracting candidate pronunciations from Web pages and associating them with
orthographic words, filtering out poorly extracted pronunciations, normalizing IPA
pronunciations to better conform to a common transcription standard, and generating
phonemic from ad-hoc transcriptions. We show improvements on a letter-to-phoneme task
when using web-derived vs. Pronlex pronunciations.