Efficient Estimation of Word Representations in Vector Space
Venue
International Conference on Learning Representations (2013)
Publication Year
2013
Authors
Tomas Mikolov, Kai Chen, Greg S. Corrado, Jeffrey Dean
BibTeX
Abstract
We propose two novel model architectures for computing continuous vector
representations of words from very large data sets. The quality of these
representations is measured in a word similarity task, and the results are compared
to the previously best performing techniques based on different types of neural
networks. We observe large improvements in accuracy at much lower computational
cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6
billion words data set. Furthermore, we show that these vectors provide
state-of-the-art performance on our test set for measuring syntactic and semantic
word similarities.
