We present Plato, a probabilistic model for entity resolution that includes a novel
approach for handling noisy or uninformative features,and supplements labeled
training data derived from Wikipedia with a very large unlabeled text corpus.
Training and inference in the proposed model can easily be distributed across many
servers, allowing it to scale to over 10^7 entities. We evaluate Plato on three
standard datasets for entity resolution. Our approach achieves the best results
to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC
KBP 2012 datasets.