Our mission is to build the most scalable library for graph algorithms and analysis and apply it to a multitude of Google products. We formalize data mining and machine learning challenges as graph problems and perform fundamental research in those fields leading to publications in top venues. Our algorithms and systems are used in a wide array of Google products such as Search, YouTube, AdWords, Play, Maps, and Social.
Balanced Partitioning splits a large graph into roughly equal parts while minimizing cut size. The problem of “fairly” dividing a graph occurs in a number of contexts, such as assigning work in a distributed processing environment. Our techniques provided a 40% drop in multi-shard queries in the Google Maps Driving Directions, saving a significant amount of CPU usage.
Our team specializes in clustering graphs at Google scale. We have efficient implementations of many different algorithms including hierarchical clustering, overlapping clustering, local clustering, and spectral clustering.
Connected Components is a fundamental subroutine in many graph algorithms. We have state-of-the-art implementations in a variety of paradigms including MapReduce, a distributed hash table, Pregel, and ASYMP. Our methods are 10-30x faster than the best previously studied algorithms, and easily scale to graphs with trillions of edges.
Our similarity ranking and centrality metrics serve as good features for understanding the characteristics of large graphs. They allow the development of link models useful for both link prediction and anomalous link discovery. Our tool Grale learns a similarity function that models the link relationships present in data.
Our research in pairwise similarity ranking has produced a number of innovative methods, which we have published at top conferences such as WWW, ICML, and VLDB. We maintain a library of similarity algorithms including distributed Personalized PageRank, Egonet similarity, Adamic Adar, and others.
Our award-winning research on novel models of graph computation addresses important issues of privacy in graph mining. Specifically, we present techniques to efficiently solve graph problems, including computing clustering, centrality scores and shortest path distances for each node, based on its personal view of the private data in the graph while preserving the privacy of each user.
We perform innovative research analyzing massive dynamic graphs. We have developed efficient algorithms for computing densest subgraph and triangle counting which operate even when subject to high velocity streaming updates.
ASYMP is graph mining framework based on asynchronous message passing. We have highly scalable code for Connected Components and shortest-path to a subset of nodes in this framework.
Google’s most famous algorithm, PageRank, is a method for computing importance scores for vertices of a directed graph. In addition to PageRank, we have scalable implementations of several other centrality scores, such as harmonic centrality.
Our library GraphBuilder can convert data from a metric space (such as document text) into a similarity graph. GraphBuilder scales to massive datasets by applying fast locality sensitive hashing and neighborhood search.
Proceedings of VLDB (2016)
WSDM (2015), pp. 419-420
WWW (2014), pp. 349-360
The 30th International Conference on Machine Learning, ICML 2013
Proceedings of the 41st Annual ACM Symposium on Theory of Computing, ACM (2009), pp. 427-434