Cross-lingual projection for class-based language models
Abstract
This paper presents a cross-lingual projection technique for training class-based
language models. We borrow from previous success in projecting POS tags and NER
mentions to that of a trained classbased language model. We use a CRF to train a
model to predict when a sequence of words is a member of a given class and use this
to label our language model training data. We show that we can successfully project
the contextual cues for these classes across pairs of languages and retain a high
quality class model in languages with no supervised class data. We present
empirical results that show the quality of the projected models as well as their
effect on the down-stream speech recognition objective. We are able to achieve over
half the reduction of WER when using the projected class models as compared to
models trained on human annotations.
