Hippocratic Abbreviation Expansion
Abstract
Incorrect normalization of text can be particularly damaging for applications like
text-to-speech synthesis (TTS) or typing auto-correction, where the resulting
normalization is directly presented to the user, versus feeding downstream
applications. In this paper, we focus on abbreviation expansion for TTS, which
requires a ``do no harm'', high precision approach yielding few expansion errors at
the cost of leaving relatively many abbreviations unexpanded. In the context of a
large-scale, real-world TTS scenario, we present methods for training classifiers
to establish whether a particular expansion is apt. We achieve a large increase in
correct abbreviation expansion when combined with the baseline text normalization
component of the TTS system, together with a substantial reduction in incorrect
expansions.
