Pronunciation information is available in large quantities on the Web, in the form
of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate
pronunciations from Web pages and associating them with orthographic words,
filtering out poorly extracted pronunciations, normalizing IPA pronunciations to
better conform to a common transcription standard, and generating phonemic from
ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using
web-derived vs. Pronlex pronunciations.