Data Mining

113 Publications

  •    

    A Framework for Benchmarking Entity-Annotation Systems

    Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita

    Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013) (to appear)

  •    

    Classifying YouTube Channels: a Practical System

    Vincent Simonet

    Proceedings of the Web of Linked Entities Workshop 2013, ACM (to appear)

  •    

    Crowd-Sourced Call Identification and Suppression

    Daniel V. Klein, Dean K. Jackson

    Federal Trade Commission Robocall Challenge (2013)

  •   

    Distributed Large-scale Natural Graph Factorization

    Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,, Vanja Josifovski, Alexander J Smola

    Proceedings of the 22nd International World Wide Web Conference (WWW 2013) (to appear)

  •    

    Efficient and Accurate Label Propagation on Large Graphs and Label Sets

    Michele Covell, Shumeet Baluja

    Proceedings International Conference on Advances in Multimedia, IARIA (2013)

  •   

    Focused Marix Factorization for Audience Selection in Display Advertising

    Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, Jeff Yuan

    Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013) (to appear)

  •  

    KDD tutorial: The Dataminer Guide to Scalable Mixed-Membership and Nonparametric Bayesian Models

    Amr Ahmed, Alexander J Smola

    ACM conference on Knowledge Discovery and Data Mining (KDD) (2013) (to appear)

  •   

    Latent Factor Models with Additive Hierarchically-smoothed User Preferences

    Amr Ahmed, Bhargav Kanagal, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo

    Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013) (to appear)

  •    

    Semantic Queries by Example

    Lipyeow Lim, Haixun Wang, Min Wang

    Proceedings of the 16th International Conference on Extending Database Technology (EDBT 2013) (to appear)

  •  

    The Nested Chinese Restaurant Franchise Process: User Tracking and Document Modeling

    Amr Ahmed, Liangjie Hong, Alexander J Smola

    International Conference on Machine Learning (ICML) (2013) (to appear)

  •  

    Understanding Latency of Black-Box Service Workloads

    Darja Krushevskaja

    WWW 2013 (to appear)

  •   

    A Cross-Lingual Dictionary for English Wikipedia Concepts

    Valentin I. Spitkovsky, Angel X. Chang

    Eighth International Conference on Language Resources and Evaluation (LREC 2012)

  •    

    An Integrated Framework for Spatio-Temporal-Textual Search and Mining

    Bingsheng Wang, Haili Dong, Arnold Boedihardjo, Chang-Tien Lu, Harland Yu, Ing-Ray Chen, Jing Dai

    20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), ACM, 2 Penn Plaza, Suite 701, New York, NY 10121, pp. 570-573

  •   

    Budget Optimization for Online Campaigns with Positive Carryover Effects

    Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan

    WINE (2012), pp. 86-99

  •    

    Dynamic Covering for Recommendation Systems

    Ioannis Antonellis, Anish Das Sarma, Shaddin Dughmi

    CIKM (2012)

  •    

    Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data

    Davi Reis, Felipe Goldstein, Frederico Quintao

    Making sense of Microposts (at WWW 2012)

  •   

    FastEx: Hash Clustering with Exponential Families

    Amr Ahmed, Sujith Ravi, Shravan Narayanamurthy, Alex Smola

    Proceedings of the 26th Conference on Neural Information Processing Systems. (NIPS) (2012)

  •    

    Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups

    Lujun Fang, Alex Fabrikant, Kristen LeFevre

    Proceedings of ACM Web Science 2012, ACM, pp. 137-146

  •    

    MedLDA: Maximum Margin Supervised Topic Models

    Jun Zhu, Amr Ahmed, Eric P. Xing

    Journal of Machine Learning Research (2012) (to appear)

  •  

    Multi-skill Collaborative Teams based on Densest Subgraphs

    Amita Gajewar, Atish Das Sarma

    SDM (2012) (to appear)

  •   

    Nowcasting the macroeconomy with search engine data

    Hal R. Varian

    Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 1-2

  •  

    Online Selection of Diverse Results

    Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal, Andrew Tomkins

    WSDM (2012) (to appear)

  •  

    Overlapping clusters for distributed computation

    Reid Andersen, David Gleich, Vahab Mirrokni

    ACM Conference on Web Search and Data Mining (WSDM) (2012)

  •   

    PageRank on an evolving graph

    Bahman Bahmani, Ravi Kumar, Mohammad Mahdian, Eli Upfal

    KDD (2012), pp. 24-32

  •   

    Spotting fake reviewer groups in consumer reviews

    Arjun Mukherjee, Bing Liu, Natalie Glance

    Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 191-200

  •    

    The YouTube Social Network

    Mirjam Wattenhofer, Roger Wattenhofer, Zack Zhu

    ICWSM 2012, Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012) (to appear)

  •   

    Understanding cyclic trends in social choices

    Anish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Li Zhang

    Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 593-602

  •   

    V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors

    Ahmed Metwally, Christos Faloutsos

    PVLDB Proceedings of the VLDB Endowment, vol. 5 (2012) (to appear)

  •    

    Vote calibration in community question-answering systems

    Bee-Chung Chen, Anirban Dasgupta, Xuanhui Wang, Jie Yang

    SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), pp. 781-790

  •    

    Web-Scale Multi-Task Feature Selection for Behavioral Targeting

    Amr Ahmed, Mohamed Aly, Abhimanyu Das, Alex Smola, Tasos Anastasakos

    Proceedings of The 21st ACM International Conference on Information and Knowledge Management (CIKM), ACM (2012) (to appear)

  •   

    YouTube around the world: geographic popularity of videos

    Anders Brodersen, Salvatore Scellato, Mirjam Wattenhofer

    Proceedings of the 21st international conference on World Wide Web, ACM, New York, NY, USA (2012), pp. 241-250

  •   

    Your Two Weeks of Fame and Your Grandmother's

    James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins

    WWW (2012)

  •    

    A Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis

    Rohan Seth, Michele Covell, Deepak Ravichandran, D. Sivakumar, Shumeet Baluja

    Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (2011)

  •    

    Catching a viral video

    Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer

    Journal of Intelligent Information Systems (2011), pp. 1-19

  •    

    Detecting Adversarial Advertisements in the Wild

    D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, Yunkai Zhou

    Proceedings of the 17th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, KDD (2011)

  •  

    Efficient Search Engine Measurements

    Ziv Bar-Yossef, Maxim Gurevich

    ACM Transactions on the Web, vol. 5, no. 4 (2011), pp. 18

  •    

    Efficient Spectral Neighborhood Blocking for Entity Resolution

    Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng

    International Conference on Data Engineering 2011 (ICDE), IEEE, pp. 1-12

  •   

    Estimating the Number of Users behind IPs for Combating Abusive Traffic

    Ahmed Metwally, Matt Paduano

    SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, ACM, 2 Penn Plaza New York, NY 10121-0799 (2011), pp. 249-257

  •    

    Fast Algorithms for Finding Extremal Sets

    Roberto J. Bayardo, Biswanath Panda

    Proc. of the 2011 SIAM Int'l Conf. on Data Mining (to appear)

  •  

    Frequent Pattern Discovery and Association Rule Mining of XML Data

    Qin Ding, Gnanasekaran Sundarraj

    XML Data Mining: Models, Methods, and Applications, IGI Publishing (2011) (to appear)

  •   

    Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate

    Alex Collins

    SIAM 2011 International Conference on Data Mining, SIAM, Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA., pp. 379-390

  •   

    Interactive Itinerary Planning

    Senjuti Basu-Roy, Gautam Das, Sihem Amer-Yahia, Cong Yu

    ICDE (2011)

  •    

    Large Scale Page-Based Book Similarity Clustering

    Nemanja Spasojevic, Guillaume Poncin

    ICDAR 2011

  •   

    Large-scale community detection on YouTube for Topic Discovery and Exploration

    Ullas Gargi, Wenjun Lu, Vahab Mirrokni, Sangho Yoon

    AAAI Conference on Weblogs and Social Media 2011

  •   

    Learning to Target: What Works for Behavioral Targeting

    Sandeep Pandey, Mohamed Aly, Abraham Bagherjeiran, Andrew Hatch, Peter Ciccolo, Adwait Ratnaparkhi, Martin Zinkevich

    CIKM '11, ACM, Glasgow, Scotland, UK (2011), pp. 1805-1814

  •   

    MRI: Meaningful Interpretations of Collaborative Ratings

    Mahashweta Das, Sihem Amer-Yahia, Gautam Das, Cong Yu

    VLDB (2011)

  •  

    Personalized Social Recommendations - Accurate or Private?

    Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

    Very Large Data Bases (VLDB) (2011)

  •   

    Stanford's Distantly-Supervised Slot-Filling System

    Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning

    Fourth Text Analysis Conference (TAC 2011)

  •   

    Stanford-UBC Entity Linking at TAC-KBP, Again

    Angel X. Chang, Valentin I. Spitkovsky, Eneko Agirre, Christopher D. Manning

    Fourth Text Analysis Conference (TAC 2011)

  •   

    Strong Baselines for Cross-Lingual Entity Linking

    Angel X. Chang, Valentin I. Spitkovsky

    Fourth Text Analysis Conference (TAC 2011)

  •    

    Suggesting (More) Friends Using the Implicit Social Graph

    Maayan Roth, Tzvika Barenholz, Assaf Ben-David, David Deutscher, Guy Flysher, Avinatan Hassidim, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom

    International Conference on Machine Learning (ICML) (2011)

  •    

    Unary Data Structures for Language Models

    Jeffrey Sorensen, Cyril Allauzen

    Interspeech 2011, International Speech Communication Association, pp. 1425-1428

  •   

    A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task

    Mihai Surdeanu, David McClosky, Julie Tibshirani, John Bauer, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning

    Third Text Analysis Conference (TAC 2010)

  •   

    AdHeat: An Influence-based Diffusion Model for Propagating Hints to Match Ads

    Hongji Bao, Edward Y. Chang

    Proceedings of WWW2010, IW3C2, pp. 71-80

  •    

    Catching a Viral Video

    Tom Broxton, Yannet Interian, Jon Vaver, Mirjam Wattenhofer

    IEEE SIASP@ICDM 2010

  •    

    Confucius and Its Intelligent Disciples: Integrating Social with Search

    Xiance Si, Edward Y. Chang, Zoltan Gyongyi, Maosong Sun

    Proceedings of VLDB 2010, 36th International Conference on Very Large Data Bases, VLDB Endowment, pp. 1505-1516

  •    

    Improved classification through runoff elections

    Oleg Golubitsky, Stephen M. Watt

    Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, ACM, Boston (2010), pp. 59-64

  •    

    Mining Arabic Business Reviews

    Mohamed Elhawary, Mohamed Elfeky

    IEEE, pp. 1108-1113

  •   

    Mining advertiser-specific user behavior using adfactors

    Nikolay Archak, Vahab S. Mirrokni, S. Muthukrishnan

    WWW (2010), pp. 31-40

  •    

    Overlapping Experiment Infrastructure: More, Better, Faster Experimentation

    Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer

    Proceedings 16th Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC (2010), pp. 17-26

  •  

    PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization

    Edward Y. Chang, Hongjie Bai, Kaihua Zhu, Hao Wang, Jian Li, Zhihuan Qiu

    Scaling Up Machine Learning, Cambridge University Press (2010)

  •   

    Stanford-UBC Entity Linking at TAC-KBP

    Angel X. Chang, Valentin I. Spitkovsky, Eric Yeh, Eneko Agirre, Christopher D. Manning

    Third Text Analysis Conference (TAC 2010)

  •   

    Ad Quality On TV: Predicting Television Audience Retention

    Yannet Interian, Sundar Dorai-Raj, Igor Naverniouk, P. J. Opalinski, Kaustuv, Dan Zigmond

    Proceedings of ADKDD (2009)

  •   

    An incentive-based architecture for social recommendations

    Rajat Bhattacharjee, Ashish Goel, Konstantinos Kollias

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 229-232

  •   

    Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior

    Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Edward Chang

    18th International Conference on World Wide Web (WWW), ACM (2009), pp. 681-690

  •    

    Computers and iPhones and Mobile Phones, oh my! A logs-based comparison of search users on different devices

    Maryam Kamvar, Melanie Kellar, Rajan Patel, Ya Xu

    WWW 2009 MADRID, pp. 801-810

  •   

    Do Viewers Care? Understanding the impact of ad creatives on TV viewing behavior

    Yannet Interian, Kaustuv, Igor Naverniouk, P. J. Opalinski, Sundar Dorai-raj, Dan Zigmond

    Re:Think 2009

  •   

    Efficient Clustering of Web-Derived Data Sets

    Luís Sarmento, Alexander Kehlenbeck, Eugénio C. Oliveira, Lyle Ungar

    MLDM '09: Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, Berlin, Heidelberg (2009), pp. 398-412

  •   

    Finding topic trends in digital libraries

    Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee Giles

    JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, ACM, New York, NY, USA (2009), pp. 69-72

  •   

    Going Mini: Extreme Lightweight Spam Filters

    D. Sculley, Gordon V. Cormack

    CEAS 2009: Proceedings of the Sixth Conference on Email and Anti-Spam

  •   

    On the Predictability of Search Trends (manuscript)

    Yair Shimshoni, Niv Efron, Yossi Matias

    Google (2009)

  •    

    PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

    Biswanath Panda, Joshua S. Herbach, Sugato Basu, Roberto J. Bayardo

    Proceedings of the 35th International Conference on Very Large Data Bases (VLDB-2009)

  •   

    Parallel community detection on large networks with propinquity dynamics

    Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou

    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA (2009), pp. 997-1006

  •   

    Predicting Bounce Rates in Sponsored Search Advertisements

    D. Sculley, Robert Malkin, Sugato Basu, Roberto J. Bayardo

    Proc. of the 15th International ACM-SIGKDD Conference on Knowledge Discovery and Data Mining, ACM (2009), pp. 1325-1334

  •   

    Preference aggregation in group recommender systems for committee decision-making

    Jacob P. Baskin, Shriram Krishnamurthi

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 337-340

  •   

    Pricing Guidance in Ad Sale Negotiations: The Print Ads Example

    Adam Juda, S. N. Muthukrishnan, Ashish Rastogi

    The Third Annual International Workshop on Data Mining and Audience Intelligence for Advertising (2009)

  •   

    Rating aggregation in collaborative filtering systems

    Florent Garcin, Boi Faltings, Radu Jurca, Nadine Joswig

    RecSys '09: Proceedings of the third ACM conference on Recommender systems, ACM, New York, NY, USA (2009), pp. 349-352

  •    

    Scalable Attribute-Value Extraction from Semi-Structured Text

    Yuk Wah Wong, Dominic Widdows, Tom Lokovic, Kamal Nigam

    ICDM Workshop on Large-scale Data Mining: Theory and Applications (2009)

  •   

    Stanford-UBC at TAC-KBP

    Eneko Agirre, Angel X. Chang, Daniel S. Jurafsky, Christopher D. Manning, Valentin I. Spitkovsky, Eric Yeh

    Second Text Analysis Conference (TAC 2009)

  •    

    Succinct approximate counting of skewed data

    David Talbot

    IJCAI-09 Proceedings (2009), pp. 1243-1248

  •   

    Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs

    Shumeet Baluja, Deepak Ravichandran, D. Sivakumar

    International Conference on Knowledge Discovery and Information Retrieval (2009)

  •   

    The Unreasonable Effectiveness of Data

    Alon Halevy, Peter Norvig, Fernando Pereira

    IEEE Intelligent Systems, vol. 24 (2009), pp. 8-12

  •   

    Tour the world: a technical demonstration of a web-scale landmark recognition engine

    Yan-Tao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven, Jay Yagnik

    MM '09: Proceedings of the seventeen ACM international conference on Multimedia, ACM, New York, NY, USA (2009), pp. 961-962

  •    

    Video2Text: Learning to Annotate Video Content

    Hrishikesh Aradhye, George Toderici, Jay Yagnik

    ICDM Workshop on Internet Multimedia Mining (2009)

  •   

    A Social Query Model for Decentralized Search

    Arindam Banerjee, Sugato Basu

    Second ACM Workshop on Social Network Mining and Analysis at the KDD Conference (SNAKDD-08) (2008)

  •   

    Bootstrapping Information Extraction from Semi-structured Web Pages

    Andrew Carlson, Charles Schafer

    ECML/PKDD, Springer Lecture Notes in Computer Science Volume 5211/2008 (2008), pp. 16

  •  

    Constrained Clustering: Advances in Algorithms, Theory, and Applications

    Sugato Basu, Ian Davidson, Kiri Wagstaff

    CRC Press (2008)

  •  

    Detecting Image Spam using Visual Features and Near Duplicate Detection

    Bhaskar Mehta, Saurabh Nangia, Manish Gupta, Wolgang Nejdl

    Proceedings of WWW 2008

  •   

    Efficient Concept Clustering for Ontology Learning using an Event Life Cycle on the Web

    Sangsoo Sung, Seokkyung Chung, Dennis McLeod

    Proc. 2008 ACM SYmposium on Applied Computing, ACM, Fortaleza, Brazil, pp. 2310-2314

  •   

    Exploring a Digital Library Through Key Ideas

    Bill N. Schilit, Okan Kolak

    JCDL, Pittsburgh, Pennsylvania, USA (2008), pp. 177-186

  •   

    Extreme Data Mining

    Sridhar Ramaswamy

    Proceedings 2008 ACM SIGMOD International Conference on Management of Data, ACM, Vancouver, pp. 1-2

  •   

    Modeling Online Reviews with Multi-Grain Topic Models

    Ivan Titov, Ryan McDonald

    17th International World Wide Web Conference (2008)

  •   

    PFP: Parallel FP-Growth for Query Recommendation

    Haoyuan Li, Yi Wang, Dong Zhang, Edward Chang, Ming Zhang

    ACM Recommendation Systems (2008) (to appear)

  •   

    Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph

    Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, Mohamed Aly

    WWW-2008

  •    

    A Support Vector Approach to Censored Targets

    Pannagadatta Shivaswamy, Wei Chu, Martin Jansche

    Seventh IEEE International Conference on Data Mining (ICDM) (2007), pp. 655-660

  •   

    Clustering Billions of Images with Large Scale Nearest Neighbor Search

    Ting Liu, Chuck Rosenberg, Henry A. Rowley

    IEEE Workshop on Applications of Computer Vision, IEEE (2007)

  •  

    Google Book Search: Document Understanding on a Massive Scale

    L. Vincent

    PROC. ninth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Washington, DC (2007), pp. 819-823

  •   

    Mining API patterns as partial orders from source code: from usage scenarios to specifications

    Mithun Acharya, Tao Xie, Jian Pei, Jun Xu

    Proc. ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, Dubrovnik, Croatia (2007), pp. 25-34

  •   

    Relational Clustering by Symmetric Convex Coding

    Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, Philip S. Yu

    Proc. 24th ICML, ACM, Corvalis (2007), pp. 569-576

  •   

    Scaling Up All Pairs Similarity Search

    Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant

    Proc. of the 16th Int'l Conf. on the World Wide Web (2007)

  •   

    Cluster Ranking with an Application to Mining Mailbox Networks

    Ziv Bar-Yossef, Ido Guy, Ronny Lempel, Yoelle S. Maarek, Vladimir Soroka

    ICDM (2006), pp. 63-74

  •  

    Dense Subgraph Extraction

    David Gibson, Ravi Kumar, Kevin S. McCurley, Andrew Tomkins

    in: Mining Graph Data, John Wiley & Sons (2006), pp. 411-441

  •   

    Mining for proposal reviewers: lessons learned at the national science foundation

    Seth Hettich, Michael J. Pazzani

    Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 862-871

  •   

    Mining the Web to Determine Similarity Between Words, Objects, and Communities

    Mehran Sahami

    Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006)

  •   

    New cached-sufficient statistics algorithms for quickly answering statistical questions

    Andrew Moore

    Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, PA (2006), pp. 2

  •    

    STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

    Mohamed G. Elfeky, Walid G. Aref, Ahmed K. Elmagarmid

    Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), IEEE Computer Society, pp. 188-199

  •  

    Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

    Mikhail Bilenko, Sugato Basu, Mehran Sahami

    Proceedings of the 5th IEEE International Conference on Data Mining (2005), pp. 58-65

  •   

    Evaluating similarity measures: a large-scale study in the orkut social network

    Ellen Spertus, Mehran Sahami, Orkut Buyukkokten

    Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005), pp. 678-684

  •   

    Unweaving a web of documents

    R. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram

    KDD (2005), pp. 574-579

  •   

    A social network caught in the Web

    Lada A. Adamic, Orkut Buyukkokten, Eytan Adar

    First Monday, vol. 8 (2003)

  •   

    Mining Optimized Gain Rules for Numeric Attributes

    Sergey Brin, Rajeev Rastogi, Kyuseok Shim

    IEEE Trans. Knowl. Data Eng., vol. 15 (2003), pp. 324-338

  •   

    Extracting Patterns and Relations from the World Wide Web

    Sergey Brin

    WebDB (1998), pp. 172-183

  •   

    Scalable Techniques for Mining Causal Structures

    Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman

    VLDB (1998), pp. 594-605