Publication Data
Unsupervised Translation Sense Clustering
Abstract: We propose an unsupervised method for clustering the
translations of a word, such that the translations in each cluster share a common
semantic sense. Words are assigned to clusters based on their usage distribution in
large monolingual and parallel corpora using the soft K-Means algorithm. In addition to
describing our approach, we formalize the task of translation sense clustering and
describe a procedure that leverages WordNet for evaluation. By comparing our induced
clusters to reference clusters generated from WordNet, we demonstrate that our method
effectively identifies sense-based translation clusters and benefits from both
monolingual and parallel corpora. Finally, we describe a method for annotating clusters
with usage examples.
