Publication Data
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Abstract: We describe a novel approach for inducing unsupervised
part-of-speech taggers for languages that have no labeled training data, but have
translated text in a resource-rich language. Our method does not assume any knowledge
about the target language (in particular no tagging dictionary is assumed), making it
applicable for a wide array of resource-poor languages. We use graph-based label
propagation for cross-lingual knowledge transfer and use the projected labels as
constraints in an unsupervised model. Across six European languages, our approach
results in an average absolute improvement of 9.7\% over the state-of-the-art baseline,
and 17.0\% over vanilla hidden Markov models induced with EM.
