Dennis Fetterly
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Robust Large-Scale Machine Learning in the Cloud
Steffen Rendle
Eugene J. Shekita
Bor-yiing Su
Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA (2016)
Preview abstract
The convergence behavior of many distributed machine learning (ML) algorithms can be sensitive to the number of machines being used or to changes in the computing environment. As a result, scaling to a large number of machines can be challenging. In this paper, we describe a new scalable coordinate descent (SCD) algorithm for generalized linear models whose convergence behavior is always the same, regardless of how much SCD is scaled out and regardless of the computing environment. This makes SCD highly robust and enables it to scale to massive datasets on low-cost commodity servers. Experimental results on a real advertising dataset in Google are used to demonstrate SCD's cost effectiveness and scalability. Using Google's internal cloud, we show that SCD can provide near linear scaling using thousands of cores for 1 trillion training examples on a petabyte of compressed data. This represents 10,000x more training examples than the `large-scale' Netflix prize dataset. We also show that SCD can learn a model for 20 billion training examples in two hours for about $10.
View details
Robust query rewriting using anchor data
Nick Craswell
Bodo Billerbeck
6th ACM Intl. Conference on Web Search and Data Mining (WSDM), ACM (2013), pp. 335-344
Of hammers and nails: an empirical comparison of three paradigms for processing large graphs
Alan Halverson
Krishnaram Kenthapadi
Sreenivas Gollapudi
WSDM (2012), pp. 103-112
Microsoft Research at TREC 2011 Web Track
The Power of Peers
Microsoft Research at TREC 2010 Web Track
Microsoft Research at TREC 2009: Web and Relevance Feedback Track
Detecting spam web pages through content analysis
Detecting phrase-level duplication on the world wide web
Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages
On The Evolution of Clusters of Near-Duplicate Web Pages
A large-scale study of the evolution of Web pages
Mark Manasse
Janet L. Wiener
Softw., Pract. Exper., vol. 34 (2004), pp. 213-237
A large-scale study of the evolution of web pages
On the Evolution of Clusters of Near-Duplicate Web Pages