Hippocratic Abbreviation Expansion
Abstract
Incorrect normalization of text can be particularly damaging for applications
like text-to-speech synthesis (TTS) or typing auto-correction, where the
resulting normalization is directly presented to the user, versus feeding
downstream applications. In this paper, we focus on abbreviation expansion
for TTS, which requires a ``do no harm'', high precision approach yielding few
expansion errors at the cost of leaving relatively many abbreviations
unexpanded. In the context of a large-scale, real-world TTS scenario, we
present methods for training classifiers to establish whether a particular
expansion is apt. We achieve a large increase in correct abbreviation
expansion when combined with the baseline text normalization component of the
TTS system, together with a substantial reduction in incorrect expansions.