Top-k Publish-Subscribe for Social Annotation of News
Venue
Proceedings of the 39th International Conference on Very Large Data Bases, VLDB Endowment (2013)
Publication Year
2013
Authors
Alexander Shraer, Maxim Gurevich, Marcus Fontoura, Vanja Josifovski
BibTeX
Abstract
Social content, such as Twitter updates, often have the quickest first-hand reports
of news events, as well as numerous commentaries that are indicative of public view
of such events. As such, social updates provide a good complement to professionally
written news articles. In this paper we consider the problem of automatically
annotating news stories with social updates (tweets), at a news website serving
high volume of pageviews. The high rate of both the pageviews (millions to billions
a day) and of the incoming tweets (more than 100 millions a day) make real-time
indexing of tweets ineffective, as this requires an index that is both queried and
updated extremely frequently. The rate of tweet updates makes caching techniques
almost unusable since the cache would become stale very quickly. We propose a novel
architecture where each story is treated as a subscription for tweets relevant to
the story's content, and new algorithms that efficiently match tweets to stories,
proactively maintaining the top-k tweets for each story. Such {\em top-k pub-sub}
consumes only a small fraction of the resource cost of alternative solutions, and
can be applicable to other large scale content-based publish-subscribe problems. We
demonstrate the effectiveness of our approach on real-world data: a corpus of news
stories from Yahoo! News and a log of Twitter updates.
