Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Venue
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 725-734
Publication Year
2008
Authors
Wolfgang Macherey, Franz Och, Ignacio Thayer, Jakob Uszkoreit
BibTeX
Abstract
Minimum Error Rate Training (MERT) is an effective means to estimate the feature
function weights of a linear model such that an automated evaluation criterion for
measuring system performance can directly be optimized in training. To accomplish
this, the training procedure determines for each feature function its exact error
surface on a given set of candidate translations. The feature function weights are
then adjusted by traversing the error surface combined over all sentences and
picking those values for which the resulting error count reaches a minimum.
Typically, candidates in MERT are represented as N-best lists which contain the N
most probable translation hypotheses produced by a decoder. In this paper, we
present a novel algorithm that allows for efficiently constructing and representing
the exact error surface of all translations that are encoded in a phrase lattice.
Compared to N-best MERT, the number of candidate translations thus taken into
account increases by several orders of magnitudes. The proposed method is used to
train the feature function weights of a phrase-based statistical machine
translation system. Experiments conducted on the NIST 2008 translation tasks show
significant runtime improvements and moderate BLEU score gains over N-best MERT.
