Publication Data
Back-off Language Model Compression
Abstract: With the availability of large amounts of training data
relevant to speech recognition scenarios, scalability becomes a very productive way to
improve language model performance. We present a technique that represents a back-off
n-gram language model using arrays of integer values and thus renders it amenable to
effective block compression. We propose a few such compression algorithms and evaluate
the resulting language model along two dimensions: memory footprint, and speed
reduction relative to the uncompressed one. We experimented with a model that uses a
32-bit word vocabulary (at most 4B words) and log-probabilities/back-off-weights
quantized to 1 byte, respectively. The best compression algorithm achieves 2.6
bytes/n-gram at ≈18X slower than uncompressed. For faster LM operation we found it
feasible to represent the LM at ≈4.0 bytes/n-gram, and ≈3X slower than the uncompressed
LM. The memory footprint of a LM containing one billion n-grams can thus be reduced to
3–4 Gbytes without impacting its speed too much. See the presentation material from
a talk about this paper.
