Quantitative Analysis of Culture Using Millions of Digitized Books
Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K.
Gray, The Google Books Team, Joseph P. Pickett, Dale Holberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak,
Erez Lieberman Aiden
We constructed a corpus of digitized texts containing about 4% of all books ever
printed. Analysis of this corpus enables us to investigate cultural trends
quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic
and cultural phenomena that were reflected in the English language between 1800 and
2000. We show how this approach can provide insights about fields as diverse as
lexicography, the evolution of grammar, collective memory, the adoption of
technology, the pursuit of fame, censorship, and historical epidemiology.
Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array
of new phenomena spanning the social sciences and the humanities.