Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification
Venue
The VLDB Journal (2013), pp. 1-25
Publication Year
2013
Authors
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, Srikanta Tirthapura
BibTeX
Abstract
Recent years have witnessed an unprecedented proliferation of social media. People
around the globe author, every day, millions of blog posts, micro-blog posts,
social network status updates, etc. This rich stream of information can be used to
identify, on an ongoing basis, emerging stories, and events that capture popular
attention. Stories can be identified via groups of tightly-coupled real-world
entities, namely the people, locations, products, etc., that are involved in the
story. The sheer scale, and rapid evolution of the data involved necessitate highly
efficient techniques for identifying important stories at every point of time. The
main challenge in real-time story identification is the maintenance of dense
subgraphs (corresponding to groups of tightly-coupled entities) under streaming
edge weight updates (resulting from a stream of user-generated content). This is
the first work to study the efficient maintenance of dense subgraphs under such
streaming edge weight updates. For a wide range of definitions of density, we
derive theoretical results regarding the magnitude of change that a single edge
weight update can cause. Based on these, we propose a novel algorithm, DynDens,
which outperforms adaptations of existing techniques to this setting, and yields
meaningful results. Our approach is validated by a thorough experimental evaluation
on large-scale real and synthetic datasets.
