Data Mining

149 Publications

  •    

    Biperpedia: An Ontology for Search Applications

    Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu

    Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB) (2014)

  •  

    Distributed Balanced Clustering via Mapping Coresets

    Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni

    NIPS, Neural Information Processing Systems Foundation (2014) (to appear)

  •    

    Frame by Frame Language Identification in Short Utterances using Deep Neural Networks

    Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez

    Neural Networks Special Issue: Neural Network Learning in Big Data (2014) (to appear)

  •   

    Great Question! Question Quality in Community Q&A

    Sujith Ravi, Bo Pang, Vibhor Rastogi, Ravi Kumar

    International AAAI Conference on Weblogs and Social Media (ICWSM) (2014)

  •    

    Knowledge Base Completion via Search-Based Question Answering

    Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin

    WWW (2014)

  •   

    Near Neighbor Join

    Herald Kllapi, Boulos Harb, Cong Yu

    ICDE (2014)

  •   

    On Estimating the Average Degree

    Anirban Dasgupta, Ravi Kumar, Tamas Sarlos

    23rd International World Wide Web Conference, WWW '14, ACM (2014) (to appear)

  •    

    Quizz: Targeted Crowdsourcing with a Billion (Potential) Users

    Panos Ipeirotis, Evgeniy Gabrilovich

    WWW (2014) (to appear)

  •    

    RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response

    Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova

    Proceedings of the 21st ACM Conference on Computer and Communications Security, ACM, Scottsdale, Arizona (2014) (to appear)

  •    

    Reducing the Sampling Complexity of Topic Models

    Aaron Li, Amr Ahmed, Sujith Ravi, Alexander J Smola

    ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)

  •    

    Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

    Amr Ahmed, Abhimanyu Das, Alexander J. Smola

    ACM International Conference on Web Search And Data Mining (WSDM) (2014)

  •    

    Taxonomy Discovery for Personalized Recommendation

    Yuchen Zhang, Amr Ahmed, Vanja Josifovski, Alexander J Smola

    ACM International Conference on Web Search And Data Mining (WSDM) (2014)

  •    

    Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation

    Chun How Tan, Eugene Agichtein, Panos Ipeirotis, Evgeniy Gabrilovich

    WSDM (2014) (to appear)

  •    

    Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

    Liang Zhao, Feng Cheng, Jing Dai, Ting Hua, Chang-Tien Lu, Naren Ramakrishnan

    PLOS ONE, vol. 9 (2014), pp. 1-12

  •    

    A Framework for Benchmarking Entity-Annotation Systems

    Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita

    Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013) (to appear)

  •    

    Classifying YouTube Channels: a Practical System

    Vincent Simonet

    Proceedings of the 2nd International Workshop on Web of Linked Entities (WOLE 2013), in Proceedings of the 22nd International conference on World Wide Web companion, ACM, pp. 1295-1304

  •   

    Compacting Large and Loose Communities

    Chandrashekhar V., Shailesh Kumar, C. V. Jawahar

    Asian Conference on Pattern Recognition (2013) (to appear)

  •  

    Crawling deep web entity pages

    Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman, Nirav Shah

    WSDM (2013), pp. 355-364

  •    

    Crowd-Sourced Call Identification and Suppression

    Daniel V. Klein, Dean K. Jackson

    Federal Trade Commission Robocall Challenge (2013)

  •    

    Data Fusion: Resolving Conflicts from Multiple Sources

    Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava

    WAIM (2013), pp. 64-76 (to appear)

  •    

    Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification

    Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, Srikanta Tirthapura

    The VLDB Journal (2013), pp. 1-25

  •   

    Distributed Large-scale Natural Graph Factorization

    Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,, Vanja Josifovski, Alexander J Smola

    Proceedings of the 22nd International World Wide Web Conference (WWW 2013) (to appear)

  •    

    Diversity maximization under matroid constraints

    Zeinab Abbassi, Vahab Mirrokni, Mayur Thakur

    KDD, ACM SIGKDD (2013), pp. 32-40

  •    

    Efficient and Accurate Label Propagation on Large Graphs and Label Sets

    Michele Covell, Shumeet Baluja

    Proceedings International Conference on Advances in Multimedia, IARIA (2013)

  •   

    Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

    Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman

    ACL 2013

  •   

    Focused Marix Factorization for Audience Selection in Display Advertising

    Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, Jeff Yuan

    Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013) (to appear)

  •   

    From Assets to Stories via the Google Cultural Institute Platform

    W. Brent Seales, Steve Crossan, Sertan Girgin, Mark Yoshitake

    IEEE BigData'13 Big Data and the Humanities (2013), pp. 6 (to appear)

  •    

    GOOGLE DISEASE TRENDS: AN UPDATE

    Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, Christian Stefansen

    International Society of Neglected Tropical Diseases 2013, International Society of Neglected Tropical Diseases, pp. 3

  •   

    Identifying Surrogate Geographic Research Regions with Advanced Exact Test Statistics

    Steven Ellis

    American Marketing Association Advanced Research Techniques Forum (2013), Poster

  •    

    Instant Foodie: Predicting Expert Ratings From Grassroots

    Chenhao Tan, Ed H. Chi, David Huffaker, Gueorgi Kossinets, Alex J. Smola

    CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA, ACM

  •  

    KDD tutorial: The Dataminer Guide to Scalable Mixed-Membership and Nonparametric Bayesian Models

    Amr Ahmed, Alexander J Smola

    ACM conference on Knowledge Discovery and Data Mining (KDD) (2013) (to appear)

  •   

    Latent Factor Models with Additive Hierarchically-smoothed User Preferences

    Amr Ahmed, Bhargav Kanagal, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo

    Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013) (to appear)

  •   

    Nowcasting with Google Trends

    Yossi Matias

    String Processing and Information Retrieval, Springer (2013), pp. 4

  •    

    Optimal Hashing Schemes for Entity Matching

    Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, Tamas Sarlos

    22nd International World Wide Web Conference, WWW '13, ACM, Rio de Janeiro, Brazil (2013), pp. 295-306

  •   

    Permutation Indexing: Fast Approximate Retrieval from Large Corpora

    Maxim Gurevich, Tamas Sarlos

    22nd International Conference on Information and Knowledge Management (CIKM), ACM (2013)

  •    

    Postmarket Drug Surveillance Without Trial Costs: Discovery of Adverse Drug Reactions Through Large-Scale Analysis of Web Search Queries

    Elad Yom-Tov, Evgeniy Gabrilovich

    Journal of Medical Internet Research, vol. 15 (2013)

  •    

    Rolling Up Random Variables in Data Cubes

    Phillip M. Yelland

    Joint Statistical Meetings, American Statistical Association, 732 North Washington Street, Alexandria, VA 22314-1943 (2013) (to appear)

  •    

    Scalable all-pairs similarity search in metric spaces

    Ye Wang, Ahmed Metwally, Srinivasan Parthasarathy

    Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2 Pennsylvania Plaza, New York, NY (2013), pp. 829-837

  •    

    Semantic Queries by Example

    Lipyeow Lim, Haixun Wang, Min Wang

    Proceedings of the 16th International Conference on Extending Database Technology (EDBT 2013) (to appear)

  •    

    TSum: Fast, Principled Table Summarization.

    Jieying Chen, Jia-Yu Pan, Christos Faloutsos, Spiros Papadimitriou

    Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, ACM (2013)

  •   

    The Nested Chinese Restaurant Franchise Process: User Tracking and Document Modeling

    Amr Ahmed, Liangjie Hong, Alexander J Smola

    International Conference on Machine Learning (ICML) (2013) (to appear)

  •    

    Tracking Large-Scale Video Remix in Real-World Events

    Lexing Xie, Apostol Natsev, Xuming He, John R. Kender, Matthew L. Hill, John R. Smith

    IEEE Transactions on Multimedia, vol. 15, no. 6 (2013), pp. 1244-1254

  •  

    Understanding Latency of Black-Box Service Workloads

    Darja Krushevskaja

    WWW 2013 (to appear)

  •   

    A Cross-Lingual Dictionary for English Wikipedia Concepts

    Valentin I. Spitkovsky, Angel X. Chang

    Eighth International Conference on Language Resources and Evaluation (LREC 2012)

  •    

    An Integrated Framework for Spatio-Temporal-Textual Search and Mining

    Bingsheng Wang, Haili Dong, Arnold Boedihardjo, Chang-Tien Lu, Harland Yu, Ing-Ray Chen, Jing Dai

    20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), ACM, 2 Penn Plaza, Suite 701, New York, NY 10121, pp. 570-573

  •    

    Automatically Discovering Talented Musicians with Acoustic Analysis of YouTube Videos

    Eric Nichols, Charles DuHadway, Hrishikesh Aradhye, Richard F. Lyon

    Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), IEEE Computer Society, Washington, DC, USA, pp. 559-565

  •   

    Budget Optimization for Online Campaigns with Positive Carryover Effects

    Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan

    WINE (2012), pp. 86-99

  •    

    Dynamic Covering for Recommendation Systems

    Ioannis Antonellis, Anish Das Sarma, Shaddin Dughmi

    CIKM (2012)

  •    

    Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data

    Davi Reis, Felipe Goldstein, Frederico Quintao

    Making sense of Microposts (at WWW 2012)

  •   

    FastEx: Hash Clustering with Exponential Families

    Amr Ahmed, Sujith Ravi, Shravan Narayanamurthy, Alex Smola

    Proceedings of the 26th Conference on Neural Information Processing Systems. (NIPS) (2012)

  •    

    Logical Itemset Mining

    Shailesh Kumar, Chandrashekhar V., C. V. Jawahar

    IEEE International Conference on Data Mining (Workshop) (2012), pp. 603-610

  •    

    Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups

    Lujun Fang, Alex Fabrikant, Kristen LeFevre

    Proceedings of ACM Web Science 2012, ACM, pp. 137-146

  •    

    MedLDA: Maximum Margin Supervised Topic Models

    Jun Zhu, Amr Ahmed, Eric P. Xing

    Journal of Machine Learning Research (2012) (to appear)

  •  

    Multi-skill Collaborative Teams based on Densest Subgraphs

    Amita Gajewar, Atish Das Sarma

    SDM (2012) (to appear)

  •   

    Multimedia Semantics: Interactions Between Content and Community

    Hari Sundaram, Lexing Xie, Munmun De Choudhury, Yu-Ru Lin, Apostol Natsev

    Proceedings of the IEEE, vol. 100, no. 9 (2012)

  •   

    Nowcasting the macroeconomy with search engine data

    Hal R. Varian

    Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 1-2

  •   

    Online Selection of Diverse Results

    Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal, Andrew Tomkins

    Proceedings of the 5th ACM international Conference on Web Search and Data Mining (2012), pp. 263-272

  •  

    Overlapping clusters for distributed computation

    Reid Andersen, David Gleich, Vahab Mirrokni

    ACM Conference on Web Search and Data Mining (WSDM) (2012)

  •   

    PageRank on an evolving graph

    Bahman Bahmani, Ravi Kumar, Mohammad Mahdian, Eli Upfal

    KDD (2012), pp. 24-32

  •   

    Spotting fake reviewer groups in consumer reviews

    Arjun Mukherjee, Bing Liu, Natalie Glance

    Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 191-200

  •    

    Student-t based Robust Spatio-Temporal Prediction

    Yang Chen, Feng Chen, Jing Dai, T. Charles Clancy, Yao-Jan Wu

    IEEE 12th International Conference on Data Mining, IEEE, Brussels, Belgium (2012), pp. 151-160

  •    

    The YouTube Social Network

    Mirjam Wattenhofer, Roger Wattenhofer, Zack Zhu

    ICWSM 2012, Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012) (to appear)

  •   

    Understanding cyclic trends in social choices

    Anish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Li Zhang

    Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 593-602

  •    

    V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors

    Ahmed Metwally, Christos Faloutsos

    PVLDB Proceedings of the VLDB Endowment, vol. 5 (2012), pp. 704-715

  •    

    Vote calibration in community question-answering systems

    Bee-Chung Chen, Anirban Dasgupta, Xuanhui Wang, Jie Yang

    SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), pp. 781-790

  •    

    Web-Scale Multi-Task Feature Selection for Behavioral Targeting

    Amr Ahmed, Mohamed Aly, Abhimanyu Das, Alex Smola, Tasos Anastasakos

    Proceedings of The 21st ACM International Conference on Information and Knowledge Management (CIKM), ACM (2012) (to appear)

  •   

    YouTube around the world: geographic popularity of videos

    Anders Brodersen, Salvatore Scellato, Mirjam Wattenhofer

    Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 241-250

  •   

    Your Two Weeks of Fame and Your Grandmother's

    James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins

    WWW (2012)

  •    

    A Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis

    Rohan Seth, Michele Covell, Deepak Ravichandran, D. Sivakumar, Shumeet Baluja

    Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (2011)

  •    

    Catching a viral video

    Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer

    Journal of Intelligent Information Systems (2011), pp. 1-19

  •    

    Detecting Adversarial Advertisements in the Wild

    D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, Yunkai Zhou

    Proceedings of the 17th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, KDD (2011)

  •  

    Efficient Search Engine Measurements

    Ziv Bar-Yossef, Maxim Gurevich

    ACM Transactions on the Web, vol. 5, no. 4 (2011), pp. 18

  •    

    Efficient Spectral Neighborhood Blocking for Entity Resolution

    Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng

    International Conference on Data Engineering 2011 (ICDE), IEEE, pp. 1-12

  •   

    Estimating the Number of Users behind IPs for Combating Abusive Traffic

    Ahmed Metwally, Matt Paduano

    SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, ACM, 2 Penn Plaza New York, NY 10121-0799 (2011), pp. 249-257

  •    

    Fast Algorithms for Finding Extremal Sets

    Roberto J. Bayardo, Biswanath Panda

    Proc. of the 2011 SIAM Int'l Conf. on Data Mining (to appear)

  •  

    Frequent Pattern Discovery and Association Rule Mining of XML Data

    Qin Ding, Gnanasekaran Sundarraj

    XML Data Mining: Models, Methods, and Applications, IGI Publishing (2011) (to appear)

  •   

    Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate

    Alex Collins

    SIAM 2011 International Conference on Data Mining, SIAM, Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA., pp. 379-390

  •   

    Interactive Itinerary Planning

    Senjuti Basu-Roy, Gautam Das, Sihem Amer-Yahia, Cong Yu

    ICDE (2011)

  •    

    Large Scale Page-Based Book Similarity Clustering

    Nemanja Spasojevic, Guillaume Poncin

    ICDAR 2011

  •   

    Large-scale community detection on YouTube for Topic Discovery and Exploration

    Ullas Gargi, Wenjun Lu, Vahab Mirrokni, Sangho Yoon

    AAAI Conference on Weblogs and Social Media 2011

  •   

    Learning to Target: What Works for Behavioral Targeting

    Sandeep Pandey, Mohamed Aly, Abraham Bagherjeiran, Andrew Hatch, Peter Ciccolo, Adwait Ratnaparkhi, Martin Zinkevich

    CIKM '11, ACM, Glasgow, Scotland, UK (2011), pp. 1805-1814

  •   

    MRI: Meaningful Interpretations of Collaborative Ratings

    Mahashweta Das, Sihem Amer-Yahia, Gautam Das, Cong Yu

    VLDB (2011)

  •  

    Personalized Social Recommendations - Accurate or Private?

    Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

    Very Large Data Bases (VLDB) (2011)

  •   

    Stanford's Distantly-Supervised Slot-Filling System

    Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning

    Fourth Text Analysis Conference (TAC 2011)

  •   

    Stanford-UBC Entity Linking at TAC-KBP, Again

    Angel X. Chang, Valentin I. Spitkovsky, Eneko Agirre, Christopher D. Manning

    Fourth Text Analysis Conference (TAC 2011)

  •   

    Strong Baselines for Cross-Lingual Entity Linking

    Angel X. Chang, Valentin I. Spitkovsky

    Fourth Text Analysis Conference (TAC 2011)

  •    

    Suggesting (More) Friends Using the Implicit Social Graph

    Maayan Roth, Tzvika Barenholz, Assaf Ben-David, David Deutscher, Guy Flysher, Avinatan Hassidim, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom

    International Conference on Machine Learning (ICML) (2011)

  •    

    Unary Data Structures for Language Models

    Jeffrey Sorensen, Cyril Allauzen

    Interspeech 2011, International Speech Communication Association, pp. 1425-1428

  •   

    A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task

    Mihai Surdeanu, David McClosky, Julie Tibshirani, John Bauer, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning

    Third Text Analysis Conference (TAC 2010)

  •   

    AdHeat: An Influence-based Diffusion Model for Propagating Hints to Match Ads

    Hongji Bao, Edward Y. Chang

    Proceedings of WWW2010, IW3C2, pp. 71-80

  •    

    Catching a Viral Video

    Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer

    IEEE SIASP@ICDM 2010

  •    

    Confucius and Its Intelligent Disciples: Integrating Social with Search

    Xiance Si, Edward Y. Chang, Zoltan Gyongyi, Maosong Sun

    Proceedings of VLDB 2010, 36th International Conference on Very Large Data Bases, VLDB Endowment, pp. 1505-1516

  •    

    Improved classification through runoff elections

    Oleg Golubitsky, Stephen M. Watt

    Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, ACM, Boston (2010), pp. 59-64

  •    

    Mining Arabic Business Reviews

    Mohamed Elhawary, Mohamed Elfeky

    IEEE, pp. 1108-1113

  •   

    Mining advertiser-specific user behavior using adfactors

    Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan

    WWW (2010), pp. 31-40

  •    

    Overlapping Experiment Infrastructure: More, Better, Faster Experimentation

    Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer

    Proceedings 16th Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC (2010), pp. 17-26

  •  

    PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization

    Edward Y. Chang, Hongjie Bai, Kaihua Zhu, Hao Wang, Jian Li, Zhihuan Qiu

    Scaling Up Machine Learning, Cambridge University Press (2010)

  •   

    Stanford-UBC Entity Linking at TAC-KBP

    Angel X. Chang, Valentin I. Spitkovsky, Eric Yeh, Eneko Agirre, Christopher D. Manning

    Third Text Analysis Conference (TAC 2010)

  •   

    Ad Quality On TV: Predicting Television Audience Retention

    Yannet Interian, Sundar Dorai-Raj, Igor Naverniouk, P. J. Opalinski, Kaustuv, Dan Zigmond

    Proceedings of ADKDD (2009)

  •   

    An incentive-based architecture for social recommendations

    Rajat Bhattacharjee, Ashish Goel, Konstantinos Kollias

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 229-232

  •   

    Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior

    Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Edward Chang

    18th International Conference on World Wide Web (WWW), ACM (2009), pp. 681-690

  •    

    Computers and iPhones and Mobile Phones, oh my! A logs-based comparison of search users on different devices

    Maryam Kamvar, Melanie Kellar, Rajan Patel, Ya Xu

    WWW 2009 MADRID, pp. 801-810

  •   

    Do Viewers Care? Understanding the impact of ad creatives on TV viewing behavior

    Yannet Interian, Kaustuv, Igor Naverniouk, P. J. Opalinski, Sundar Dorai-raj, Dan Zigmond

    Re:Think 2009

  •   

    Efficient Clustering of Web-Derived Data Sets

    Luís Sarmento, Alexander Kehlenbeck, Eugénio C. Oliveira, Lyle Ungar

    MLDM '09: Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, Berlin, Heidelberg (2009), pp. 398-412

  •   

    Finding topic trends in digital libraries

    Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee Giles

    JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, ACM, New York, NY, USA (2009), pp. 69-72

  •   

    Going Mini: Extreme Lightweight Spam Filters

    D. Sculley, Gordon V. Cormack

    CEAS 2009: Proceedings of the Sixth Conference on Email and Anti-Spam

  •   

    On the Predictability of Search Trends (manuscript)

    Yair Shimshoni, Niv Efron, Yossi Matias

    Google (2009)

  •    

    PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

    Biswanath Panda, Joshua S. Herbach, Sugato Basu, Roberto J. Bayardo

    Proceedings of the 35th International Conference on Very Large Data Bases (VLDB-2009)

  •   

    Parallel community detection on large networks with propinquity dynamics

    Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou

    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA (2009), pp. 997-1006

  •   

    Predicting Bounce Rates in Sponsored Search Advertisements

    D. Sculley, Robert Malkin, Sugato Basu, Roberto J. Bayardo

    Proc. of the 15th International ACM-SIGKDD Conference on Knowledge Discovery and Data Mining, ACM (2009), pp. 1325-1334

  •   

    Preference aggregation in group recommender systems for committee decision-making

    Jacob P. Baskin, Shriram Krishnamurthi

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 337-340

  •   

    Pricing Guidance in Ad Sale Negotiations: The Print Ads Example

    Adam Juda, S. N. Muthukrishnan, Ashish Rastogi

    The Third Annual International Workshop on Data Mining and Audience Intelligence for Advertising (2009)

  •   

    Rating aggregation in collaborative filtering systems

    Florent Garcin, Boi Faltings, Radu Jurca, Nadine Joswig

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 349-352

  •    

    Scalable Attribute-Value Extraction from Semi-Structured Text

    Yuk Wah Wong, Dominic Widdows, Tom Lokovic, Kamal Nigam

    ICDM Workshop on Large-scale Data Mining: Theory and Applications (2009)

  •   

    Stanford-UBC at TAC-KBP

    Eneko Agirre, Angel X. Chang, Daniel S. Jurafsky, Christopher D. Manning, Valentin I. Spitkovsky, Eric Yeh

    Second Text Analysis Conference (TAC 2009)

  •    

    Succinct approximate counting of skewed data

    David Talbot

    IJCAI-09 Proceedings (2009), pp. 1243-1248

  •   

    Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs

    Shumeet Baluja, Deepak Ravichandran, D. Sivakumar

    International Conference on Knowledge Discovery and Information Retrieval (2009)

  •   

    The Unreasonable Effectiveness of Data

    Alon Halevy, Peter Norvig, Fernando Pereira

    IEEE Intelligent Systems, vol. 24 (2009), pp. 8-12

  •   

    Tour the world: a technical demonstration of a web-scale landmark recognition engine

    Yan-Tao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven, Jay Yagnik

    MM '09: Proceedings of the seventeen ACM international conference on Multimedia, ACM, New York, NY, USA (2009), pp. 961-962

  •    

    Video2Text: Learning to Annotate Video Content

    Hrishikesh Aradhye, George Toderici, Jay Yagnik

    ICDM Workshop on Internet Multimedia Mining (2009)

  •   

    A Social Query Model for Decentralized Search

    Arindam Banerjee, Sugato Basu

    Second ACM Workshop on Social Network Mining and Analysis at the KDD Conference (SNAKDD-08) (2008)

  •   

    Bootstrapping Information Extraction from Semi-structured Web Pages

    Andrew Carlson, Charles Schafer

    ECML/PKDD, Springer Lecture Notes in Computer Science Volume 5211/2008 (2008), pp. 16

  •  

    Constrained Clustering: Advances in Algorithms, Theory, and Applications

    Sugato Basu, Ian Davidson, Kiri Wagstaff

    CRC Press (2008)

  •  

    Detecting Image Spam using Visual Features and Near Duplicate Detection

    Bhaskar Mehta, Saurabh Nangia, Manish Gupta, Wolgang Nejdl

    Proceedings of WWW 2008

  •   

    Efficient Concept Clustering for Ontology Learning using an Event Life Cycle on the Web

    Sangsoo Sung, Seokkyung Chung, Dennis McLeod

    Proc. 2008 ACM SYmposium on Applied Computing, ACM, Fortaleza, Brazil, pp. 2310-2314

  •   

    Exploring a Digital Library Through Key Ideas

    Bill N. Schilit, Okan Kolak

    JCDL, Pittsburgh, Pennsylvania, USA (2008), pp. 177-186

  •   

    Extreme Data Mining

    Sridhar Ramaswamy

    Proceedings 2008 ACM SIGMOD International Conference on Management of Data, ACM, Vancouver, pp. 1-2

  •   

    Modeling Online Reviews with Multi-Grain Topic Models

    Ivan Titov, Ryan McDonald

    17th International World Wide Web Conference (2008)

  •   

    PFP: Parallel FP-Growth for Query Recommendation

    Haoyuan Li, Yi Wang, Dong Zhang, Edward Chang, Ming Zhang

    ACM Recommendation Systems (2008) (to appear)

  •   

    Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph

    Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, Mohamed Aly

    WWW-2008

  •    

    A Support Vector Approach to Censored Targets

    Pannagadatta Shivaswamy, Wei Chu, Martin Jansche

    Seventh IEEE International Conference on Data Mining (ICDM) (2007), pp. 655-660

  •   

    Clustering Billions of Images with Large Scale Nearest Neighbor Search

    Ting Liu, Chuck Rosenberg, Henry A. Rowley

    IEEE Workshop on Applications of Computer Vision, IEEE (2007)

  •  

    Google Book Search: Document Understanding on a Massive Scale

    L. Vincent

    PROC. ninth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Washington, DC (2007), pp. 819-823

  •   

    Mining API patterns as partial orders from source code: from usage scenarios to specifications

    Mithun Acharya, Tao Xie, Jian Pei, Jun Xu

    Proc. ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, Dubrovnik, Croatia (2007), pp. 25-34

  •   

    Relational Clustering by Symmetric Convex Coding

    Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, Philip S. Yu

    Proc. 24th ICML, ACM, Corvalis (2007), pp. 569-576

  •   

    Scaling Up All Pairs Similarity Search

    Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant

    Proc. of the 16th Int'l Conf. on the World Wide Web (2007)

  •   

    Cluster Ranking with an Application to Mining Mailbox Networks

    Ziv Bar-Yossef, Ido Guy, Ronny Lempel, Yoelle S. Maarek, Vladimir Soroka

    ICDM (2006), pp. 63-74

  •  

    Dense Subgraph Extraction

    David Gibson, Ravi Kumar, Kevin S. McCurley, Andrew Tomkins

    in: Mining Graph Data, John Wiley & Sons (2006), pp. 411-441

  •   

    Mining for proposal reviewers: lessons learned at the national science foundation

    Seth Hettich, Michael J. Pazzani

    Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 862-871

  •   

    Mining the Web to Determine Similarity Between Words, Objects, and Communities

    Mehran Sahami

    Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006)

  •   

    New cached-sufficient statistics algorithms for quickly answering statistical questions

    Andrew Moore

    Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 2

  •    

    STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

    Mohamed G. Elfeky, Walid G. Aref, Ahmed K. Elmagarmid

    Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), IEEE Computer Society, pp. 188-199

  •  

    Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

    Mikhail Bilenko, Sugato Basu, Mehran Sahami

    Proceedings of the 5th IEEE International Conference on Data Mining (2005), pp. 58-65

  •   

    Evaluating similarity measures: a large-scale study in the orkut social network

    Ellen Spertus, Mehran Sahami, Orkut Buyukkokten

    Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005), pp. 678-684

  •   

    Unweaving a web of documents

    R. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram

    KDD (2005), pp. 574-579

  •   

    A social network caught in the Web

    Lada A. Adamic, Orkut Buyukkokten, Eytan Adar

    First Monday, vol. 8 (2003)

  •   

    Mining Optimized Gain Rules for Numeric Attributes

    Sergey Brin, Rajeev Rastogi, Kyuseok Shim

    IEEE Trans. Knowl. Data Eng., vol. 15 (2003), pp. 324-338

  •   

    Extracting Patterns and Relations from the World Wide Web

    Sergey Brin

    WebDB (1998), pp. 172-183

  •   

    Scalable Techniques for Mining Causal Structures

    Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman

    VLDB (1998), pp. 594-605