Unary Data Structures for Language Models
Venue
Interspeech 2011, International Speech Communication Association, pp. 1425-1428
Publication Year
2011
Authors
Jeffrey Sorensen, Cyril Allauzen
BibTeX
Abstract
Language models are important components of speech recognition and machine
translation systems. Trained on billions of words, and consisting of billions of
parameters, language models often are the single largest components of these
systems. There have been many proposed techniques to reduce the storage
requirements for language models. A technique based upon pointer-free compact
storage of ordinal trees shows compression competitive with the best proposed
systems, while retaining the full finite state structure, and without using
computationally expensive block compression schemes or lossy quantization
techniques.
