Learning to Extract Local Events from the Web

John Foley

Michael Bendersky

Vanja Josifovski

SIGIR 2015

Google Scholar

Abstract

The goal of this work is extraction and retrieval of local events
from web pages. Examples of local events include small venue
concerts, theater performances, garage sales, movie screenings,
etc. We collect these events in the form of retrievable
calendar entries that include structured information about
event name, date, time and location.

Between existing information extraction techniques and
the availability of information on social media and semantic
web technologies, there are numerous ways to collect commercial,
high-profile events. However, most extraction techniques
require domain-level supervision, which is not attainable at
web scale. Similarly, while the adoption of the semantic web
has grown, there will always be organizations without the
resources or the expertise to add machine-readable annotations
to their pages. Therefore, our approach bootstraps
these explicit annotations to massively scale up local event
extraction.

We propose a novel event extraction model that uses distant
supervision to assign scores to individual event fields
(event name, date, time and location) and a structural algorithm
to optimally group these fields into event records. Our
model integrates information from both the entire source
document and its relevant sub-regions, and is highly scalable.
We evaluate our extraction model on all 700 million documents
in a large publicly available web corpus, ClueWeb12.
Using the 217,000 unique explicitly annotated events as
distant supervision, we are able to double recall with 85%
precision and quadruple it with 65% precision, with no additional
human supervision. We also show that our model can
be bootstrapped for a fully supervised approach, which can
further improve the precision by 30%.

In addition, we evaluate the geographic coverage of the
extracted events. We find that there is a significant increase
in the geo-diversity of extracted events compared to existing
explicit annotations, while maintaining high precision
levels

Research Areas

Information Retrieval and the Web

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Learning to Extract Local Events from the Web

Abstract

Research Areas

Learn more about how we conduct our research