Inducing Sentence Structure from Parallel Corpora for Reordering
Venue
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics
Publication Year
2011
Authors
BibTeX
Abstract
When translating among languages that differ substantially in word order, machine
translation (MT) systems benefit from syntactic preordering—an approach that uses
features from a syntactic parse to permute source words into a target-language-like
order. This paper presents a method for inducing parse trees automatically from a
parallel corpus, instead of using a supervised parser trained on a treebank. These
induced parses are used to preorder source sentences. We demonstrate that our
induced parser is effective: it not only improves a state-of-the-art phrase-based
system with integrated reordering, but also approaches the performance of a recent
preordering method based on a supervised parser. These results show that the
syntactic structure which is relevant to MT pre-ordering can be learned
automatically from parallel text, thus establishing a new application for
unsupervised grammar induction.
