Language-independent Compound Splitting with Morphological Operations
Venue
ACL HLT 2011, pp. 10
Publication Year
2011
Authors
Klaus Macherey, Andrew M. Dai, David Talbot, Ashok C. Popat, Franz Och
BibTeX
Abstract
Translating compounds is an important problem in machine translation. Since many
compounds have not been observed during training, they pose a challenge for
translation systems. Previous decompounding methods have often been restricted to a
small set of languages as they cannot deal with more complex compound forming
processes. We present a novel and unsupervised method to learn the compound parts
and morphological operations needed to split compounds into their compound parts.
The method uses a bilingual corpus to learn the morphological operations required
to split a compound into its parts. Furthermore, monolingual corpora are used to
learn and filter the set of compound part candidates. We evaluate our method within
a machine translation task and show significant improvements for various languages
to show the versatility of the approach.
