Publication Data
Quantitative Analysis of Culture Using Millions of Digitized Books
Abstract: We constructed a corpus of digitized texts containing about
4% of all books ever printed. Analysis of this corpus enables us to investigate
cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing
on linguistic and cultural phenomena that were reflected in the English language
between 1800 and 2000. We show how this approach can provide insights about fields as
diverse as lexicography, the evolution of grammar, collective memory, the adoption of
technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics
extends the boundaries of rigorous quantitative inquiry to a wide array of new
phenomena spanning the social sciences and the humanities.
