Publication Data
A Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis
Abstract: Understanding the backgrounds and interest of the people who
are consuming a piece of content, such as a news story, video, or music, is vital for
the content producer as well the advertisers who rely on the content to provide a
channel on which to advertise. We extend traditional search-engine query log analysis,
which has primarily concentrated on analyzing either single or small groups of queries
or users, to examining the complete query stream of very large groups of users – the
inhabitants of 13,377 cities across the United States. Query logs can be a good
representation of the interests of the city’s inhabitants and a useful characterization
of the city itself. Further, we demonstrate how query logs can be effectively used to
gather city-level statistics sufficient for providing insights into the similarities
and differences between cities. Cities that are found to be similar through the use of
query analysis correspond well to the similar cities as determined through other
large-scale and time-consuming direct measurement studies, such as those undertaken by
the Census Bureau.
