Jump to Content
Dennis Fetterly

Dennis Fetterly

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Robust Large-Scale Machine Learning in the Cloud
    Steffen Rendle
    Eugene J. Shekita
    Bor-yiing Su
    Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA (2016)
    Preview abstract The convergence behavior of many distributed machine learning (ML) algorithms can be sensitive to the number of machines being used or to changes in the computing environment. As a result, scaling to a large number of machines can be challenging. In this paper, we describe a new scalable coordinate descent (SCD) algorithm for generalized linear models whose convergence behavior is always the same, regardless of how much SCD is scaled out and regardless of the computing environment. This makes SCD highly robust and enables it to scale to massive datasets on low-cost commodity servers. Experimental results on a real advertising dataset in Google are used to demonstrate SCD's cost effectiveness and scalability. Using Google's internal cloud, we show that SCD can provide near linear scaling using thousands of cores for 1 trillion training examples on a petabyte of compressed data. This represents 10,000x more training examples than the `large-scale' Netflix prize dataset. We also show that SCD can learn a model for 20 billion training examples in two hours for about $10. View details
    Robust query rewriting using anchor data
    Nick Craswell
    Bodo Billerbeck
    6th ACM Intl. Conference on Web Search and Data Mining (WSDM), ACM (2013), pp. 335-344
    Of hammers and nails: an empirical comparison of three paradigms for processing large graphs
    Alan Halverson
    Krishnaram Kenthapadi
    Sreenivas Gollapudi
    WSDM (2012), pp. 103-112
    Microsoft Research at TREC 2011 Web Track
    Bodo Billerbeck
    Nick Craswell
    TREC (2011)
    The Power of Peers
    Nick Craswell
    ECIR (2011), pp. 497-502
    Microsoft Research at TREC 2010 Web Track
    Nick Craswell
    TREC (2010)
    Microsoft Research at TREC 2009: Web and Relevance Feedback Track
    Nick Craswell
    Stephen Robertson
    Emine Yilmaz
    TREC (2009)
    Detecting spam web pages through content analysis
    Alexandros Ntoulas
    Mark Manasse
    WWW (2006), pp. 83-92
    Detecting phrase-level duplication on the world wide web
    Mark Manasse
    SIGIR (2005), pp. 170-177
    Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages
    Mark Manasse
    WebDB (2004), pp. 1-6
    On The Evolution of Clusters of Near-Duplicate Web Pages
    Mark Manasse
    J. Web Eng., vol. 2 (2004), pp. 228-246
    A large-scale study of the evolution of Web pages
    Mark Manasse
    Janet L. Wiener
    Softw., Pract. Exper., vol. 34 (2004), pp. 213-237
    A large-scale study of the evolution of web pages
    Mark Manasse
    Janet L. Wiener
    WWW (2003), pp. 669-678
    On the Evolution of Clusters of Near-Duplicate Web Pages
    Mark Manasse
    LA-WEB (2003), pp. 37-45