Data Mining and Modeling
The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out of the box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms; to developing automated metrics to measure the quality of a road map.
Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.
184 Publications
-
A New Approach to Optimal Code Formatting
Google, Inc. (2016)
-
ICCS 2016, 2302–2311
-
Deep Neural Networks for YouTube Recommendations
Paul Covington, Jay Adams, Emre Sargin
Proceedings of the 10th ACM Conference on Recommender Systems, ACM, New York, NY, USA (2016) (to appear)
-
Discovering Structure in the Universe of Attribute Names
Alon Halevy, Natalya Fridman Noy, Sunita Sarawagi, Steven Euijong Whang, Xiao Yu
Proc. 25th International World Wide Web Conference (2016)
-
Ego-net Community Mining Applied to Friend Suggestion
Alessandro Epasto, Silvio Lattanzi, Vahab S. Mirrokni, Ismail Sebe, Ahmed Taei, Sunita Verma
Proceedings of VLDB (2016)
-
From Freebase to Wikidata: The Great Migration
Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, Lydia Pintscher
World Wide Web Conference, ACM (2016)
-
Hierarchical Label Propagation and Discovery for Machine Generated Email
James B. Wendt, Michael Bendersky, Lluis Garcia-Pueyo, Vanja Josifovski, Balint Miklos, Ivo Krka, Amitabh Saikia, Jie Yang, Marc-Allen Cartright, Sujith Ravi
Proceedings of the International Conference on Web Search and Data Mining (WSDM), ACM (2016), pp. 317-326
-
LLORMA: Local Low-Rank Matrix Approximation
Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer, Samy Bengio
Journal of Machine Learning Research (JMLR), vol. 17 (2016), pp. 1-24
-
Linking Users Across Domains with Location Data: Theory and Validation
Chistopher Riederer, Yunsung Kim, Nitish Korula, Silvio Lattanzi, Augustin Chaintreau
WWW (2016) (to appear)
-
On Sampling Nodes in a Network
Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, Tamas Sarlos
WWW (2016) (to appear)
-
Open and Closed Schema for Aligning Knowledge and Text Collections.
Workshop on Exploiting Semantic Annotations for Information Retrieval (ESAIR) (2016)
-
TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal
ACM SIGKDD (2016) (to appear)
-
When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks
Bryan Perozzi, Michael Schueppert, Jack Saalweachter, Mayur Thakur
Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016) (to appear)
-
Active Learning in Keyword Search-based Data Integration
Zhepeng Yan, Nan Zheng, Zachary G. Ives, Partha Pratim Talukdar, Cong Yu
The VLDB Journal, vol. 24 (2015), pp. 611-631
-
Applying WebTables in Practice
Sreeram Balakrishnan, Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu
Conference on Innovative Data Systems Research (2015)
-
Associating Locations with Healthcare Events
Defensive Publications Series, Technical Disclosure Commons (2015)
-
Automatic Pronunciation Verification for Speech Recognition
Kanishka Rao, Fuchun Peng, Françoise Beaufays
ICASSP (2015)
-
Crowdsourcing and the Semantic Web: A Research Manifesto
Cristina Sarasua, Elena Simperl, Natasha Noy, Abraham Bernstein, Jan Marco Leimeister
Human Computation, vol. 2 (2015)
-
Discovering Subsumption Relationships for Web-Based Ontologies
Dana Movshovitz-Attias, Steven Euijong Whang, Natalya Noy, Alon Halevy
Proc. 18th International Workshop on the Web and Databases (WebDB) (2015)
-
Distributed Graph Algorithmics: Theory and Practice
Silvio Lattanzi, Vahab S. Mirrokni
WSDM (2015), pp. 419-420
-
Efficient Algorithms for Public-Private Social Networks
Flavio Chierichetti, Alessandro Epasto, Ravi Kumar, Silvio Lattanzi, Vahab Mirrokni
KDD (2015)
-
Efficient Densest Subgraph Computation in Evolving Graphs
Alessandro Epasto, Silvio Lattanzi, Mauro Sozio
WWW (2015)
-
Event Relevant Reminders
Defensive Publications Series, Technical Disclosure Commons (2015)
-
Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs
Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman
ICASSP (2015)
-
Focus on the Long-Term: It's better for Users and Business
Henning Hohnhold, Deirdre O'Brien, Diane Tang
Proceedings 21st Conference on Knowledge Discovery and Data Mining, ACM, Sydney, Australia (2015)
-
Improving User Topic Interest Profiles by Behavior Factorization
Zhe Zhao, Zhiyuan Cheng, Lichan Hong, Ed H. Chi
Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015), pp. 1406-1416
-
Linked Enterprise Data Model and Its Use in Real Time Analytics and Context-Driven Data Discovery
KUNAL TANEJA, Qian Zhu, Desmond Duggan, Teresa Tung
IEEE International Conference on Mobile Services, 1800 (2015), pp. 277-283 (to appear)
-
Mining Subjective Properties on the Web
Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul Gupta
SIGMOD (2015) (to appear)
-
Scalable Community Discovery from Multi-Faceted Graphs
Ahmed Metwally, Jia-Yu Pan, Minh Doan, Christos Faloutsos
2015 IEEE International Conference on Big Data, IEEE, 445 Hoes Lane Piscataway, NJ 08854-4141 USA (to appear)
-
Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, Mike Williamson
WWW'15 - Proceedings of the 22nd international conference on World Wide Web, ACM (2015)
-
Temporal/Spatial Calendar Events and Triggers
Defensive Publications Series, Technical Disclosure Commons (2015)
-
Unified and contrasting cuts in multiple graphs: application to medical imaging segmentation
Chia-Tung Kuo, Xiang Wang, Peter Walker, Owen Carmichael, Jieping Ye, Ian Davidson
KDD (2015), pp. 617-626
-
Biperpedia: An Ontology for Search Applications
Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu
Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB) (2014)
-
Distributed Balanced Clustering via Mapping Coresets
Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni
NIPS, Neural Information Processing Systems Foundation (2014)
-
Frame by Frame Language Identification in Short Utterances using Deep Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez
Neural Networks Special Issue: Neural Network Learning in Big Data (2014)
-
Great Question! Question Quality in Community Q&A
Sujith Ravi, Bo Pang, Vibhor Rastogi, Ravi Kumar
International AAAI Conference on Weblogs and Social Media (ICWSM) (2014)
-
Handcrafted Fraud and Extortion: Manual Account Hijacking in the Wild
Elie Bursztein, Borbala Benko, Daniel Margolis, Tadek Pietraszek, Andy Archer, Allan Aquino, Andreas Pitsillidis, Stefan Savage
IMC '14 Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, 1600 Amphitheatre Parkway, pp. 347-358
-
Knowledge Base Completion via Search-Based Question Answering
Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin
WWW (2014)
-
Near Neighbor Join
Herald Kllapi, Boulos Harb, Cong Yu
ICDE (2014)
-
On Estimating the Average Degree
Anirban Dasgupta, Ravi Kumar, Tamas Sarlos
23rd International World Wide Web Conference, WWW '14, ACM (2014) (to appear)
-
Quizz: Targeted Crowdsourcing with a Billion (Potential) Users
Panos Ipeirotis, Evgeniy Gabrilovich
WWW (2014) (to appear)
-
RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response
Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova
Proceedings of the 21st ACM Conference on Computer and Communications Security, ACM, Scottsdale, Arizona (2014)
-
Reducing the Sampling Complexity of Topic Models
Aaron Li, Amr Ahmed, Sujith Ravi, Alexander J Smola
ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)
-
Amr Ahmed, Abhimanyu Das, Alexander J. Smola
ACM International Conference on Web Search And Data Mining (WSDM) (2014)
-
Taxonomy Discovery for Personalized Recommendation
Yuchen Zhang, Amr Ahmed, Vanja Josifovski, Alexander J Smola
ACM International Conference on Web Search And Data Mining (WSDM) (2014)
-
Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation
Chun How Tan, Eugene Agichtein, Panos Ipeirotis, Evgeniy Gabrilovich
WSDM (2014) (to appear)
-
Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling
Liang Zhao, Feng Cheng, Jing Dai, Ting Hua, Chang-Tien Lu, Naren Ramakrishnan
PLOS ONE, vol. 9 (2014), pp. 1-12
-
A Framework for Benchmarking Entity-Annotation Systems
Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita
Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013)
-
Classifying YouTube Channels: a Practical System
Proceedings of the 2nd International Workshop on Web of Linked Entities (WOLE 2013), in Proceedings of the 22nd International conference on World Wide Web companion, ACM, pp. 1295-1304
-
Compacting Large and Loose Communities
Chandrashekhar V., Shailesh Kumar, C. V. Jawahar
Asian Conference on Pattern Recognition (2013) (to appear)
-
Crawling deep web entity pages
Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman, Nirav Shah
WSDM (2013), pp. 355-364
-
Crowd-Sourced Call Identification and Suppression
Daniel V. Klein, Dean K. Jackson
Federal Trade Commission Robocall Challenge (2013)
-
Data Fusion: Resolving Conflicts from Multiple Sources
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava
WAIM (2013), pp. 64-76 (to appear)
-
Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, Srikanta Tirthapura
The VLDB Journal (2013), pp. 1-25
-
Distributed Large-scale Natural Graph Factorization
Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,, Vanja Josifovski, Alexander J Smola
Proceedings of the 22nd International World Wide Web Conference (WWW 2013) (to appear)
-
Diversity maximization under matroid constraints
Zeinab Abbassi, Vahab Mirrokni, Mayur Thakur
KDD, ACM SIGKDD (2013), pp. 32-40
-
Efficient and Accurate Label Propagation on Large Graphs and Label Sets
Michele Covell, Shumeet Baluja
Proceedings International Conference on Advances in Multimedia, IARIA (2013)
-
Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman
ACL 2013
-
Focused Marix Factorization for Audience Selection in Display Advertising
Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, Jeff Yuan
Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)
-
From Assets to Stories via the Google Cultural Institute Platform
W. Brent Seales, Steve Crossan, Sertan Girgin, Mark Yoshitake
IEEE BigData'13 Big Data and the Humanities (2013), pp. 6 (to appear)
-
GOOGLE DISEASE TRENDS: AN UPDATE
Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, Christian Stefansen
International Society of Neglected Tropical Diseases 2013, International Society of Neglected Tropical Diseases, pp. 3
-
Identifying Surrogate Geographic Research Regions with Advanced Exact Test Statistics
American Marketing Association Advanced Research Techniques Forum (2013), Poster
-
Instant Foodie: Predicting Expert Ratings From Grassroots
Chenhao Tan, Ed H. Chi, David Huffaker, Gueorgi Kossinets, Alex J. Smola
CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA, ACM
-
KDD tutorial: The Dataminer Guide to Scalable Mixed-Membership and Nonparametric Bayesian Models
Amr Ahmed, Alexander J Smola
ACM conference on Knowledge Discovery and Data Mining (KDD) (2013) (to appear)
-
Latent Factor Models with Additive Hierarchically-smoothed User Preferences
Amr Ahmed, Bhargav Kanagal, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo
Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013)
-
Local Low-Rank Matrix Approximation
Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer
Proceedings of the 30th International Conference on Machine Learning (ICML), Journal of Machine Learning Research (2013)
-
Matrix Approximation under Local Low-Rank Assumption
Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer
The Learning Workshop in International Conference on Learning Representations (ICLR) (2013)
-
String Processing and Information Retrieval, Springer (2013), pp. 4
-
Optimal Hashing Schemes for Entity Matching
Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, Tamas Sarlos
22nd International World Wide Web Conference, WWW '13, ACM, Rio de Janeiro, Brazil (2013), pp. 295-306
-
Permutation Indexing: Fast Approximate Retrieval from Large Corpora
22nd International Conference on Information and Knowledge Management (CIKM), ACM (2013)
-
Elad Yom-Tov, Evgeniy Gabrilovich
Journal of Medical Internet Research, vol. 15 (2013)
-
Rolling Up Random Variables in Data Cubes
Joint Statistical Meetings, American Statistical Association, 732 North Washington Street, Alexandria, VA 22314-1943 (2013)
-
Scalable all-pairs similarity search in metric spaces
Ye Wang, Ahmed Metwally, Srinivasan Parthasarathy
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2 Pennsylvania Plaza, New York, NY (2013), pp. 829-837
-
Lipyeow Lim, Haixun Wang, Min Wang
Proceedings of the 16th International Conference on Extending Database Technology (EDBT 2013) (to appear)
-
TSum: Fast, Principled Table Summarization.
Jieying Chen, Jia-Yu Pan, Christos Faloutsos, Spiros Papadimitriou
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, ACM (2013)
-
The Nested Chinese Restaurant Franchise Process: User Tracking and Document Modeling
Amr Ahmed, Liangjie Hong, Alexander J Smola
International Conference on Machine Learning (ICML) (2013) (to appear)
-
Tracking Large-Scale Video Remix in Real-World Events
Lexing Xie, Apostol Natsev, Xuming He, John R. Kender, Matthew L. Hill, John R. Smith
IEEE Transactions on Multimedia, vol. 15, no. 6 (2013), pp. 1244-1254
-
Understanding Latency of Black-Box Service Workloads
Darja Krushevskaja
WWW 2013 (to appear)
-
A Cross-Lingual Dictionary for English Wikipedia Concepts
Valentin I. Spitkovsky, Angel X. Chang
Eighth International Conference on Language Resources and Evaluation (LREC 2012)
-
An Integrated Framework for Spatio-Temporal-Textual Search and Mining
Bingsheng Wang, Haili Dong, Arnold Boedihardjo, Chang-Tien Lu, Harland Yu, Ing-Ray Chen, Jing Dai
20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), ACM, 2 Penn Plaza, Suite 701, New York, NY 10121, pp. 570-573
-
Automatically Discovering Talented Musicians with Acoustic Analysis of YouTube Videos
Eric Nichols, Charles DuHadway, Hrishikesh Aradhye, Richard F. Lyon
Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), IEEE Computer Society, Washington, DC, USA, pp. 559-565
-
Budget Optimization for Online Campaigns with Positive Carryover Effects
Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan
WINE (2012), pp. 86-99
-
Dynamic Covering for Recommendation Systems
Ioannis Antonellis, Anish Das Sarma, Shaddin Dughmi
CIKM (2012)
-
Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data
Davi Reis, Felipe Goldstein, Frederico Quintao
Making sense of Microposts (at WWW 2012)
-
FastEx: Hash Clustering with Exponential Families
Amr Ahmed, Sujith Ravi, Shravan Narayanamurthy, Alex Smola
Proceedings of the 26th Conference on Neural Information Processing Systems. (NIPS) (2012)
-
Shailesh Kumar, Chandrashekhar V., C. V. Jawahar
IEEE International Conference on Data Mining (Workshop) (2012), pp. 603-610
-
Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups
Lujun Fang, Alex Fabrikant, Kristen LeFevre
Proceedings of ACM Web Science 2012, ACM, pp. 137-146
-
MedLDA: Maximum Margin Supervised Topic Models
Jun Zhu, Amr Ahmed, Eric P. Xing
Journal of Machine Learning Research (2012) (to appear)
-
Multi-skill Collaborative Teams based on Densest Subgraphs
Amita Gajewar, Atish Das Sarma
SDM (2012) (to appear)
-
Multimedia Semantics: Interactions Between Content and Community
Hari Sundaram, Lexing Xie, Munmun De Choudhury, Yu-Ru Lin, Apostol Natsev
Proceedings of the IEEE, vol. 100, no. 9 (2012)
-
Nowcasting the macroeconomy with search engine data
Hal R. Varian
Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 1-2
-
Online Selection of Diverse Results
Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal, Andrew Tomkins
Proceedings of the 5th ACM international Conference on Web Search and Data Mining (2012), pp. 263-272
-
Overlapping clusters for distributed computation
Reid Andersen, David Gleich, Vahab Mirrokni
ACM Conference on Web Search and Data Mining (WSDM) (2012)
-
PageRank on an evolving graph
Bahman Bahmani, Ravi Kumar, Mohammad Mahdian, Eli Upfal
KDD (2012), pp. 24-32
-
Spotting fake reviewer groups in consumer reviews
Arjun Mukherjee, Bing Liu, Natalie Glance
Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 191-200
-
Student-t based Robust Spatio-Temporal Prediction
Yang Chen, Feng Chen, Jing Dai, T. Charles Clancy, Yao-Jan Wu
IEEE 12th International Conference on Data Mining, IEEE, Brussels, Belgium (2012), pp. 151-160
-
Mirjam Wattenhofer, Roger Wattenhofer, Zack Zhu
Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012)
-
V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors
Ahmed Metwally, Christos Faloutsos
PVLDB Proceedings of the VLDB Endowment, vol. 5 (2012), pp. 704-715
-
Vote calibration in community question-answering systems
Bee-Chung Chen, Anirban Dasgupta, Xuanhui Wang, Jie Yang
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), pp. 781-790
-
Web-Scale Multi-Task Feature Selection for Behavioral Targeting
Amr Ahmed, Mohamed Aly, Abhimanyu Das, Alex Smola, Tasos Anastasakos
Proceedings of The 21st ACM International Conference on Information and Knowledge Management (CIKM), ACM (2012) (to appear)
-
YouTube around the world: geographic popularity of videos
Anders Brodersen, Salvatore Scellato, Mirjam Wattenhofer
Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 241-250
-
Your Two Weeks of Fame and Your Grandmother's
James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins
WWW (2012)
-
A Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis
Rohan Seth, Michele Covell, Deepak Ravichandran, D. Sivakumar, Shumeet Baluja
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (2011)
-
Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer
Journal of Intelligent Information Systems (2011), pp. 1-19
-
Detecting Adversarial Advertisements in the Wild
D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, Yunkai Zhou
Proceedings of the 17th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, KDD (2011)
-
Efficient Search Engine Measurements
Ziv Bar-Yossef, Maxim Gurevich
ACM Transactions on the Web, vol. 5, no. 4 (2011), pp. 18
-
Efficient Spectral Neighborhood Blocking for Entity Resolution
Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng
International Conference on Data Engineering 2011 (ICDE), IEEE, pp. 1-12
-
Estimating the Number of Users behind IPs for Combating Abusive Traffic
Ahmed Metwally, Matt Paduano
SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, ACM, 2 Penn Plaza New York, NY 10121-0799 (2011), pp. 249-257
-
Fast Algorithms for Finding Extremal Sets
Roberto J. Bayardo, Biswanath Panda
Proc. of the 2011 SIAM Int'l Conf. on Data Mining (to appear)
-
Frequent Pattern Discovery and Association Rule Mining of XML Data
Qin Ding, Gnanasekaran Sundarraj
XML Data Mining: Models, Methods, and Applications, IGI Publishing (2011) (to appear)
-
Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate
SIAM 2011 International Conference on Data Mining, SIAM, Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA., pp. 379-390
-
Interactive Itinerary Planning
Senjuti Basu-Roy, Gautam Das, Sihem Amer-Yahia, Cong Yu
ICDE (2011)
-
Large Scale Page-Based Book Similarity Clustering
Nemanja Spasojevic, Guillaume Poncin
ICDAR 2011
-
Large-scale community detection on YouTube for Topic Discovery and Exploration
Ullas Gargi, Wenjun Lu, Vahab Mirrokni, Sangho Yoon
AAAI Conference on Weblogs and Social Media 2011
-
Learning to Target: What Works for Behavioral Targeting
Sandeep Pandey, Mohamed Aly, Abraham Bagherjeiran, Andrew Hatch, Peter Ciccolo, Adwait Ratnaparkhi, Martin Zinkevich
CIKM '11, ACM, Glasgow, Scotland, UK (2011), pp. 1805-1814
-
MRI: Meaningful Interpretations of Collaborative Ratings
Mahashweta Das, Sihem Amer-Yahia, Gautam Das, Cong Yu
VLDB (2011)
-
Personalized Social Recommendations - Accurate or Private?
Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma
Very Large Data Bases (VLDB) (2011)
-
Stanford's Distantly-Supervised Slot-Filling System
Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning
Fourth Text Analysis Conference (TAC 2011)
-
Stanford-UBC Entity Linking at TAC-KBP, Again
Angel X. Chang, Valentin I. Spitkovsky, Eneko Agirre, Christopher D. Manning
Fourth Text Analysis Conference (TAC 2011)
-
Strong Baselines for Cross-Lingual Entity Linking
Angel X. Chang, Valentin I. Spitkovsky
Fourth Text Analysis Conference (TAC 2011)
-
Suggesting (More) Friends Using the Implicit Social Graph
Maayan Roth, Tzvika Barenholz, Assaf Ben-David, David Deutscher, Guy Flysher, Avinatan Hassidim, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom
International Conference on Machine Learning (ICML) (2011)
-
Unary Data Structures for Language Models
Jeffrey Sorensen, Cyril Allauzen
Interspeech 2011, International Speech Communication Association, pp. 1425-1428
-
A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task
Mihai Surdeanu, David McClosky, Julie Tibshirani, John Bauer, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning
Third Text Analysis Conference (TAC 2010)
-
AdHeat: An Influence-based Diffusion Model for Propagating Hints to Match Ads
Hongji Bao, Edward Y. Chang
Proceedings of WWW2010, IW3C2, pp. 71-80
-
Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer
IEEE SIASP@ICDM 2010
-
Confucius and Its Intelligent Disciples: Integrating Social with Search
Xiance Si, Edward Y. Chang, Zoltan Gyongyi, Maosong Sun
Proceedings of VLDB 2010, 36th International Conference on Very Large Data Bases, VLDB Endowment, pp. 1505-1516
-
Evaluating Online Ad Campaigns in a Pipeline: Causal Models at Scale
David Chan, Rong Ge, Ori Gershony, Tim Hesterberg, Diane Lambert
Proceedings of ACM SIGKDD 2010, pp. 7-15
-
Improved classification through runoff elections
Oleg Golubitsky, Stephen M. Watt
Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, ACM, Boston (2010), pp. 59-64
-
Mining Arabic Business Reviews
Mohamed Elhawary, Mohamed Elfeky
IEEE, pp. 1108-1113
-
Mining advertiser-specific user behavior using adfactors
Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan
WWW (2010), pp. 31-40
-
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation
Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer
Proceedings 16th Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC (2010), pp. 17-26
-
PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization
Edward Y. Chang, Hongjie Bai, Kaihua Zhu, Hao Wang, Jian Li, Zhihuan Qiu
Scaling Up Machine Learning, Cambridge University Press (2010)
-
Stanford-UBC Entity Linking at TAC-KBP
Angel X. Chang, Valentin I. Spitkovsky, Eric Yeh, Eneko Agirre, Christopher D. Manning
Third Text Analysis Conference (TAC 2010)
-
Ad Quality On TV: Predicting Television Audience Retention
Yannet Interian, Sundar Dorai-Raj, Igor Naverniouk, P. J. Opalinski, Kaustuv, Dan Zigmond
Proceedings of ADKDD (2009)
-
An incentive-based architecture for social recommendations
Rajat Bhattacharjee, Ashish Goel, Konstantinos Kollias
RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 229-232
-
Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior
Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Edward Chang
18th International Conference on World Wide Web (WWW), ACM (2009), pp. 681-690
-
Maryam Kamvar, Melanie Kellar, Rajan Patel, Ya Xu
WWW 2009 MADRID, pp. 801-810
-
Do Viewers Care? Understanding the impact of ad creatives on TV viewing behavior
Yannet Interian, Kaustuv, Igor Naverniouk, P. J. Opalinski, Sundar Dorai-raj, Dan Zigmond
Re:Think 2009
-
Efficient Clustering of Web-Derived Data Sets
Luís Sarmento, Alexander Kehlenbeck, Eugénio C. Oliveira, Lyle Ungar
MLDM '09: Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, Berlin, Heidelberg (2009), pp. 398-412
-
Finding topic trends in digital libraries
Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee Giles
JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, ACM, New York, NY, USA (2009), pp. 69-72
-
Going Mini: Extreme Lightweight Spam Filters
D. Sculley, Gordon V. Cormack
CEAS 2009: Proceedings of the Sixth Conference on Email and Anti-Spam
-
On the Predictability of Search Trends
Yair Shimshoni, Niv Efron, Yossi Matias
Google (2009)
-
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce
Biswanath Panda, Joshua S. Herbach, Sugato Basu, Roberto J. Bayardo
Proceedings of the 35th International Conference on Very Large Data Bases (VLDB-2009)
-
Parallel community detection on large networks with propinquity dynamics
Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA (2009), pp. 997-1006
-
Predicting Bounce Rates in Sponsored Search Advertisements
D. Sculley, Robert Malkin, Sugato Basu, Roberto J. Bayardo
Proc. of the 15th International ACM-SIGKDD Conference on Knowledge Discovery and Data Mining, ACM (2009), pp. 1325-1334
-
Preference aggregation in group recommender systems for committee decision-making
Jacob P. Baskin, Shriram Krishnamurthi
RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 337-340
-
Pricing Guidance in Ad Sale Negotiations: The Print Ads Example
Adam Juda, S. N. Muthukrishnan, Ashish Rastogi
The Third Annual International Workshop on Data Mining and Audience Intelligence for Advertising (2009)
-
Rating aggregation in collaborative filtering systems
Florent Garcin, Boi Faltings, Radu Jurca, Nadine Joswig
RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 349-352
-
Scalable Attribute-Value Extraction from Semi-Structured Text
Yuk Wah Wong, Dominic Widdows, Tom Lokovic, Kamal Nigam
ICDM Workshop on Large-scale Data Mining: Theory and Applications (2009)
-
Stanford-UBC at TAC-KBP
Eneko Agirre, Angel X. Chang, Daniel S. Jurafsky, Christopher D. Manning, Valentin I. Spitkovsky, Eric Yeh
Second Text Analysis Conference (TAC 2009)
-
Succinct approximate counting of skewed data
IJCAI-09 Proceedings (2009), pp. 1243-1248
-
Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs
Shumeet Baluja, Deepak Ravichandran, D. Sivakumar
International Conference on Knowledge Discovery and Information Retrieval (2009)
-
The Unreasonable Effectiveness of Data
Alon Halevy, Peter Norvig, Fernando Pereira
IEEE Intelligent Systems, vol. 24 (2009), pp. 8-12
-
Tour the world: a technical demonstration of a web-scale landmark recognition engine
Yan-Tao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven, Jay Yagnik
MM '09: Proceedings of the seventeen ACM international conference on Multimedia, ACM, New York, NY, USA (2009), pp. 961-962
-
Video2Text: Learning to Annotate Video Content
Hrishikesh Aradhye, George Toderici, Jay Yagnik
ICDM Workshop on Internet Multimedia Mining (2009)
-
A Social Query Model for Decentralized Search
Arindam Banerjee, Sugato Basu
Second ACM Workshop on Social Network Mining and Analysis at the KDD Conference (SNAKDD-08) (2008)
-
Bootstrapping Information Extraction from Semi-structured Web Pages
Andrew Carlson, Charles Schafer
ECML/PKDD, Springer Lecture Notes in Computer Science Volume 5211/2008 (2008), pp. 16
-
Constrained Clustering: Advances in Algorithms, Theory, and Applications
Sugato Basu, Ian Davidson, Kiri Wagstaff
CRC Press (2008)
-
Detecting Image Spam using Visual Features and Near Duplicate Detection
Bhaskar Mehta, Saurabh Nangia, Manish Gupta, Wolgang Nejdl
Proceedings of WWW 2008
-
Efficient Concept Clustering for Ontology Learning using an Event Life Cycle on the Web
Sangsoo Sung, Seokkyung Chung, Dennis McLeod
Proc. 2008 ACM SYmposium on Applied Computing, ACM, Fortaleza, Brazil, pp. 2310-2314
-
Exploring a Digital Library Through Key Ideas
JCDL, Pittsburgh, Pennsylvania, USA (2008), pp. 177-186
-
Extreme Data Mining
Sridhar Ramaswamy
Proceedings 2008 ACM SIGMOD International Conference on Management of Data, ACM, Vancouver, pp. 1-2
-
Modeling Online Reviews with Multi-Grain Topic Models
Ivan Titov, Ryan McDonald
17th International World Wide Web Conference (2008)
-
PFP: Parallel FP-Growth for Query Recommendation
Haoyuan Li, Yi Wang, Dong Zhang, Edward Chang, Ming Zhang
ACM Recommendation Systems (2008) (to appear)
-
Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph
Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, Mohamed Aly
WWW-2008
-
A Support Vector Approach to Censored Targets
Pannagadatta Shivaswamy, Wei Chu, Martin Jansche
Seventh IEEE International Conference on Data Mining (ICDM) (2007), pp. 655-660
-
Clustering Billions of Images with Large Scale Nearest Neighbor Search
Ting Liu, Chuck Rosenberg, Henry A. Rowley
IEEE Workshop on Applications of Computer Vision, IEEE (2007)
-
Google Book Search: Document Understanding on a Massive Scale
PROC. ninth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Washington, DC (2007), pp. 819-823
-
Mining API patterns as partial orders from source code: from usage scenarios to specifications
Mithun Acharya, Tao Xie, Jian Pei, Jun Xu
Proc. ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, Dubrovnik, Croatia (2007), pp. 25-34
-
Relational Clustering by Symmetric Convex Coding
Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, Philip S. Yu
Proc. 24th ICML, ACM, Corvalis (2007), pp. 569-576
-
Scaling Up All Pairs Similarity Search
Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant
Proc. of the 16th Int'l Conf. on the World Wide Web (2007)
-
Cluster Ranking with an Application to Mining Mailbox Networks
Ziv Bar-Yossef, Ido Guy, Ronny Lempel, Yoelle S. Maarek, Vladimir Soroka
ICDM (2006), pp. 63-74
-
Dense Subgraph Extraction
David Gibson, Ravi Kumar, Kevin S. McCurley, Andrew Tomkins
in: Mining Graph Data, John Wiley & Sons (2006), pp. 411-441
-
Mining for proposal reviewers: lessons learned at the national science foundation
Seth Hettich, Michael J. Pazzani
Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 862-871
-
Mining the Web to Determine Similarity Between Words, Objects, and Communities
Mehran Sahami
Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006)
-
New cached-sufficient statistics algorithms for quickly answering statistical questions
Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 2
-
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
Mikhail Bilenko, Sugato Basu, Mehran Sahami
Proceedings of the 5th IEEE International Conference on Data Mining (2005), pp. 58-65
-
Evaluating similarity measures: a large-scale study in the orkut social network
Ellen Spertus, Mehran Sahami, Orkut Buyukkokten
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005), pp. 678-684
-
Unweaving a web of documents
R. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram
KDD (2005), pp. 574-579
-
A social network caught in the Web
Lada A. Adamic, Orkut Buyukkokten, Eytan Adar
First Monday, vol. 8 (2003)
-
Mining Optimized Gain Rules for Numeric Attributes
Sergey Brin, Rajeev Rastogi, Kyuseok Shim
IEEE Trans. Knowl. Data Eng., vol. 15 (2003), pp. 324-338
-
PowerPoint: Shot with its own bullets
The Lancet, vol. 362(9381) (2003), pp. 343-344
-
Extracting Patterns and Relations from the World Wide Web
WebDB (1998), pp. 172-183
-
Scalable Techniques for Mining Causal Structures
Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman
VLDB (1998), pp. 594-605
