Indexing the World Wide Web: The Journey So Far
Venue
Next Generation Search Engines: Advanced Models for Information Retrieval, IGI-Global (2012), pp. 1-28
Publication Year
2012
Authors
BibTeX
Abstract
In this chapter, we describe the key indexing components of today’s web search
engines. As the World Wide Web has grown, the systems and methods for indexing have
changed significantly. We present the data structures used, the features extracted,
the infrastructure needed, and the options available for designing a brand new
search engine. We highlight techniques that improve relevance of results, discuss
trade-offs to best utilize machine resources, and cover distributed processing
concepts in this context. In particular, we delve into the topics of indexing
phrases instead of terms, storage in memory vs. on disk, and data partitioning. We
will finish with some thoughts on information organization for the newly emerging
data-forms.