Unsupervised Translation Sense Clustering
Venue
the North American Association of Computational Linguistics (2012)
Publication Year
2012
Authors
Mohit Bansal, John DeNero, Dekang Lin
BibTeX
Abstract
We propose an unsupervised method for clustering the translations of a word, such
that the translations in each cluster share a common semantic sense. Words are
assigned to clusters based on their usage distribution in large monolingual and
parallel corpora using the soft K-Means algorithm. In addition to describing our
approach, we formalize the task of translation sense clustering and describe a
procedure that leverages WordNet for evaluation. By comparing our induced clusters
to reference clusters generated from WordNet, we demonstrate that our method
effectively identifies sense-based translation clusters and benefits from both
monolingual and parallel corpora. Finally, we describe a method for annotating
clusters with usage examples.
