Data Management

83 Publications

  •    

    Biperpedia: An Ontology for Search Applications

    Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu

    Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB) (2014) (to appear)

  •  

    Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores

    Wei Tan, Sandeep Tata, Yuzhe Tang, Liana Fong

    EDBT (2014) (to appear)

  •    

    From Research to Practice: Experiences Engineering a Production Metadata Database for a Scale Out File System

    Charles Johnson, Kimberly Keeton, Charles B. Morrey III, Craig A. N. Soules, Alistair Veitch, Stephen Bacon, Oskar Batuner, Marcelo Condotta, Hamilton Coutinho, Patrick J. Doyle, Rafael Eichelberger, Hugo Kiehl, Guilherme Magalhaes, James McEvoy, Padmanabhan Nagarajan, Patrick Osborne, Joaquim Souza, Andy Sparkes, Mike Spitzer, Sebastien Tandel, Lincoln Thomas, Sebastian Zangaro

    Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), USENIX

  •    

    Wikidata: A Free Collaborative Knowledge Base

    Denny Vrandečić, Markus Krötzsch

    Communications of the ACM (2014) (to appear)

  •    

    F1: A Distributed SQL Database That Scales

    Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte

    VLDB (2013)

  •    

    HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm

    Stefan Heule, Marc Nunkesser, Alex Hall

    Proceedings of the EDBT 2013 Conference, ACM, Genoa, Italy (to appear)

  •    

    Online, Asynchronous Schema Change in F1

    Ian Rae, Eric Rollins, Jeff Shute, Sukhdeep Sodhi, Radek Vingralek

    VLDB (2013)

  •    

    Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams

    Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman

    SIGMOD '13: Proceedings of the 2013 international conference on Management of data, ACM, New York, NY, USA, pp. 577-588

  •   

    Recent progress towards an ecosystem of structured data on the Web

    Nitin Gupta, Alon Y. Halevy, Boulos Harb, Heidi Lam, Hongrae Lee, Jayant Madhavan, Fei Wu, Cong Yu

    ICDE 2013: 29th International Conference on Data Engineering, IEEE, pp. 5-8

  •    

    Rolling Up Random Variables in Data Cubes

    Phillip M. Yelland

    Joint Statistical Meetings, American Statistical Association, 732 North Washington Street, Alexandria, VA 22314-1943 (2013) (to appear)

  •    

    An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks

    Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, Philip Bohannon

    CIKM (2012)

  •   

    Big Data Storytelling Through Interactive Maps

    Jayant Madhavan, Sreeram Balakrishnan, Kathryn Hurley, Hector Gonzalez, Nitin Gupta, Alon Halevy, Karen Jacqmin-Adams, Heidi Lam, Anno Langen, Hongrae Lee, Rod McChesney, Rebecca Shapley, Warren Shen

    IEEE Data Engineering Bulletin, vol. 35 (2012), pp. 46-54

  •   

    Clydesdale: structured data processing on MapReduce

    Tim Kaldewey, Eugene J. Shekita, Sandeep Tata

    Proceedings of the 15th International Conference on Extending Database Technology, ACM, New York, NY, USA (2012), pp. 15-25

  •    

    Efficient Spatial Sampling of Large Geographical Tables

    Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Y. Halevy

    SIGMOD (2012)

  •   

    Efficient spatial sampling of large geographical tables

    Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy

    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, pp. 193-204

  •    

    F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business

    Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong

    SIGMOD (2012)

  •    

    Finding Related Tables

    Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Y. Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu

    SIGMOD (2012)

  •   

    Finding related tables

    Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu

    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, pp. 817-828

  •  

    Fuzzy Joins Using MapReduce

    Foto N. Afrati, Anish Das Sarma, David Menestrina, Aditya Parameswaran, Jeffrey Ullman

    ICDE (2012) (to appear)

  •   

    Hathi: durable transactions for memory using flash

    Mohit Saxena, Mehul A. Shah, Stavros Harizopoulos, Michael M. Swift, Arif Merchant

    Proceedings of the Eighth International Workshop on Data Management on New Hardware, ACM, New York, NY, USA (2012), pp. 33-38

  •  

    Interactive Regret Minimization

    Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, Kazuhisa Makino

    SIGMOD (2012) (to appear)

  •   

    Interactive regret minimization

    Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, Kazuhisa Makino

    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, pp. 109-120

  •    

    Processing a Trillion Cells per Mouse Click

    Alex Hall, Olaf Bachmann, Robert Buessow, Silviu-Ionut Ganceanu, Marc Nunkesser

    PVLDB, vol. 5 (2012), pp. 1436-1446

  •   

    Spanner: Google's Globally-Distributed Database

    James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Dale Woodford, Yasushi Saito, Christopher Taylor, Michal Szymaniak, Ruth Wang

    OSDI (2012) (to appear)

  •   

    Symbiosis in scale out networking and data management

    Amin Vahdat

    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, pp. 579-580

  •   

    Towards an ecosystem of structured data on the web

    Alon Y. Halevy

    Proceedings of the 15th International Conference on Extending Database Technology, ACM, New York, NY, USA (2012), pp. 1-2

  •   

    Computational Journalism: A Call to Arms to Database Researchers

    Sarah Cohen, Chengkai Li, Jun Yang, Cong Yu

    CIDR (2011)

  •   

    Data Integration with Dependent Sources

    Anish Das Sarma, Luna Dong, Alon Halevy

    EDBT (2011)

  •    

    Dremel: Interactive Analysis of Web-Scale Datasets

    Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theodore Vassilakis

    Communications of the ACM, vol. 54 (2011), pp. 114-123

  •   

    Efficiently Encoding Term Co-occurrences in Inverted Indexes

    Marcus Fontoura, Maxim Gurevich, Vanja Josifovski, Sergei Vassilvitskii

    20th ACM Conference on Information and Knowledge Management (CIKM 2011) (to appear)

  •   

    Efficiently Evaluating Graph Constraints in Content-Based Publish/Subscribe

    Andrei Broder, Shirshanka Das, Marcus Fontoura, Bhaskar Ghosh, Vanja Josifovski, Jayavel Shanmugasundaram, Sergei Vassilvitskii

    The 20th International World Wide Web Confererence (WWW 2011)

  •  

    Entity-Relationship Queries over Wikipedia

    Xiaonan Li, Chengkai Li, Cong Yu

    ACM Transactions on Intelligent Systems and Technology, vol. to appear (2011)

  •   

    Evaluation Strategies for Top-k Queries over Memory-Resident Inverted Indexes

    Marcus Fontoura, Vanja Josifovski, Jinhui Liu, Srihari Venkatesan, Xiangfei Zhu, Jason Zien

    The 37th International Conference on Very Large Databases (VLDB 2011) (to appear)

  •   

    Factorization-based Lossless Compression of Inverted Indices

    George Beskales, Marcus Fontoura, Maxim Gurevich, Vanja Josifovski, Sergei Vassilvitskii

    20th ACM Conference on Information and Knowledge Management (CIKM 2011) (to appear)

  •    

    Graph cube: on warehousing and OLAP multidimensional networks

    Peixiang Zhao, Xialolei Li, Dong Xin, Jiawei Han

    SIGMOD - Proceedings of the 2011 International Conference on Management of Data, ACM, New York, NY

  •  

    Hyper-local, directions-based ranking of places

    Petros Venetis, Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen

    Proceedings of VLDB (2011), pp. 290-30

  •    

    Maestro: Quality-of-Service in Large Disk Arrays

    Arif Merchant, Mustafa Uysal, Pradeep Padala, Xiaoyun Zhu, Sharad Singhal, Kang Shin

    Proceedings of the 8th ACM international conference on Autonomic computing (ICAC), ACM, New York, NY, USA (2011), pp. 245-254

  •   

    Representative Skylines using Threshold-based Preference Distributions

    Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu

    International Conference on Data Engineering (ICDE) (2011)

  •    

    Still All On One Server: Perforce at Scale

    Dan Bloch

    2011 Perforce User Conference

  •   

    Adaptive query processing in data stream management systems under limited memory resources.

    Fatima Farag, Moustafa A. Hammad, Reda Alhajj

    Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management. PIKM 2010, Toronto, Ontario, Canada, October 30, 2010., ACM 2010, Toronto, Ontario, Canada, pp. 9-16

  •   

    Automatically incorporating new sources in keyword search-based data integration

    Partha Pratim Talukdar, Zachary G. Ives, Fernando Pereira

    SIGMOD Conference, ACM Press (2010), pp. 387-398

  •    

    Collaborative Environmental In Situ Data Collection: Experiences and Opportunities for Ambient Data Integration

    David Thau

    On the Move to Meaningful Internet Systems: OTM 2010 Workshops, Lecture Notes in Computer Science, pp. 119

  •   

    Evolution and future directions of large-scale storage and computation systems at Google

    Jeffrey Dean

    SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, ACM, New York, NY, USA (2010), pp. 1-1

  •    

    Google Fusion Tables: Data Management, Integration, and Collaboration in the Cloud

    Hector Gonzalez, Alon Halevy, Christian Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen

    Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2010)

  •    

    Google Fusion Tables: Web-Centered Data Management and Collaboration

    Hector Gonzalez, Alon Halevy, Christian Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, Jonathan Goldberg-Kidon

    Proceedings of the ACM SIGMOD conference, ACM (2010)

  •   

    Pregel: a system for large-scale graph processing

    Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski

    Proceedings of the 2010 international conference on Management of data, ACM, New York, NY, USA, pp. 135-146

  •    

    The Case Against Data Lock-in

    Brian W. Fitzpatrick, JJ Lueck

    Communications of the ACM, vol. 53 No.11 (2010), pp. 42-46

  •  

    Threshold query optimization for uncertain data

    Yinian Qi, Rohit Jain, Sarvjeet Singh, Sunil Prabhakar

    Special Interest Group on Management of Data (SIGMOD) (2010)

  •   

    VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries

    Mehdi Sharifzadeh, Cyrus Shahabi

    Proceedings of VLDB (2010)

  •    

    DRAM Errors in the Wild: A Large-Scale Field Study

    Bianca Schroeder, Eduardo Pinheiro, Wolf-Dietrich Weber

    SIGMETRICS (2009)

  •   

    Data Integration with Uncertainty

    Xin Luna Dong, Alon Halevy, Cong Yu

    The VLDB Journal, vol. 18 (2009), pp. 469-500

  •   

    Data Modeling in Dataspace Support Platforms

    Anish Das Sarma, Xin (Luna) Dong, Alon Y. Halevy

    Conceptual Modeling: Foundations and Applications, Springer-Verlag, Berlin, Heidelberg (2009), pp. 122-138

  •   

    Engineering autonomic systems

    Joseph L. Hellerstein

    ICAC '09: Proceedings of the 6th international conference on Autonomic computing, ACM, New York, NY, USA (2009), pp. 75-76

  •  

    Exploring Schema Repositories with Schemr

    Kuang Chen, Jayant Madhavan, Alon Halevy

    Proceedings of the ACM SIGMOD conference (2009), pp. 1095-1098

  •   

    Representing uncertain data: models, properties, and algorithms

    Anish Das Sarma, Omar Benjelloun, Alon Halevy, Shubha Nabar, Jennifer Widom

    The VLDB Journal, vol. 18 (2009), pp. 989-1019

  •   

    The Claremont report on database research

    Rakesh Agrawal, Anastasia Ailamaki, Philip A. Bernstein, Eric A. Brewer, Michael J. Carey, Surajit Chaudhuri, Anhai Doan, Daniela Florescu, Michael J. Franklin, Hector Garcia-Molina, Johannes Gehrke, Le Gruenwald, Laura M. Haas, Alon Y. Halevy, Joseph M. Hellerstein, Yannis E. Ioannidis, Hank F. Korth, Donald Kossmann, Samuel Madden, Roger Magoulas, Beng Chin Ooi, Tim O'Reilly, Raghu Ramakrishnan, Sunita Sarawagi, Michael Stonebraker, Alexander S. Szalay, Gerhard Weikum

    Commun. ACM, vol. 52 (2009), pp. 56-65

  •    

    Using Hoarding to Increase Availability in Shared File Systems

    Jochen Hollmann, Per Stenström

    Computer and Information Science, 2009. ICIS 2009. Eighth IEEE/ACIS International Conference on, IEEE, pp. 422 - 429

  •   

    Weighted Proximity Best-Joins for Information Retrieval

    Risi Thonangi, Hao He, Anhai Doan, Haixun Wang, Jun Yang

    ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering, IEEE Computer Society, Washington, DC, USA, pp. 234-245

  •   

    Bootstrapping Pay-as-you-go Data Integration Systems

    Anish Das Sarma, Xin Dong, Alon Halevy

    Proc. ACM SIGMOD International Conference on Management of Data, ACM, Vancouver (2008), pp. 861-874

  •   

    Pay-as-you-go User Feedback for Dataspace Systems

    Shawn R. Jeffery, Michael J. Franklin, Alon Y. Halevy

    Proc. ACM SIGMOD International Conference on Management of Data, ACM, Vancouver (2008), pp. 847-860

  •   

    The Space Complexity of Processing XML Twig Queries over Indexed Documents

    Mirit Shalem, Ziv Bar-Yossef

    Proceedings of the 24th International Conference on Data Engineering (ICDE) (2008), pp. 824-832

  •   

    Ad Hoc Distributed Simulations

    Richard Fujimoto, Michael Hunter, Jason Sirichoke, Mahesh Palekar, Hoe Kim, Wonhu Suh

    21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07), IEEE Computer Society (2007), pp. 15-24

  •   

    An Information Avalanche

    Vint Cerf

    IEEE Computer, vol. 40, no. 1 (2007), pp. 104-105

  •   

    Building MEMS-Based Storage Systems for Streaming Media

    Raju Rangaswami, Zoran Dimitrijević, Edward Chang, Klaus Schauser

    ACM Transactions on Storage, vol. 9 (2007)

  •   

    Estimating Statistical Aggregates on Probabilistic Data Streams

    T. S. Jayram, Andrew McGregor, S. Muthukrishan, Erik Vee

    Principles of Database Systems (PODS) 2007, ACM, Beijing, China, pp. 243-252

  •    

    Failure Trends in a Large Disk Drive Population

    Eduardo Pinheiro, Wolf-Dietrich Weber, Luiz André Barroso

    5th USENIX Conference on File and Storage Technologies (FAST 2007), pp. 17-29

  •   

    Indexing Dataspaces

    Xin Dong, Alon Halevy

    Proc. ACM SIGMOD, ACM, Beijing (2007)

  •   

    Life on the Edge: Monitoring and Running a Very Large Perforce Installation.

    Dan Bloch

    Perforce User Conference 2007

  •   

    Optimal Traversal Planning in Road Networks with Navigational Constraints

    Leyla Kazemi, Cyrus Shahabi, Mehdi Sharifzadeh, Luc Vincent

    ACM GIS, ACM (2007)

  •  

    Query Suspend and Resume

    Badrish Chandramouli, Chris Bond, Shivnath Babu, Jun Yang

    Proc. ACM SIGMOD, ACM, Beijing (2007)

  •    

    Web-scale Data Integration: You can only afford to Pay As You Go

    Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy

    CIDR (2007)

  •  

    Achieving completion time guarantees in an opportunistic data migration scheme

    Jianyong Zhang, Prasenjit Sarkar, Anand Sivasubramaniam

    ACM SIGMETRICS Performance Evaluation Review, vol. 33 (2006), pp. 11-16

  •  

    Data integration: the teenage years

    Alon Halevy, Anand Rajaraman, Joann Ordille

    Proc. 32nd International Conference on Very Large Databases, VLDB, Seoul, Korea (2006), pp. 9-16

  •   

    Data management projects at Google

    Wilson Hsieh, Jayant Madhavan, Rob Pike

    SIGMOD Conference (2006), pp. 725-726

  •   

    On-the-fly Sharing for Streamed Aggregation

    Sailesh Krishnamurthy, Chung Wu, Michael J. Franklin

    SIGMOD Conference (2006), pp. 623-634

  •   

    Principles of dataspace systems

    Alon Y. Halevy, Michael J. Franklin, David Maier

    PODS (2006), pp. 1-9

  •   

    Semantically-smart disk systems: past, present, and future

    Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Lakshmi N. Bairavasundaram, Timothy E. Denehy, Florentina I. Popovici, Vijayan Prabhakaran, Muthuian Sivathanu

    ACM SIGMETRICS Performance Evaluation Review, vol. 33 (2006), pp. 29-35

  •   

    Sender Reputation in a Large Webmail Service

    Bradley Taylor

    Third Conference on Email and Anti-Spam (CEAS 2006)

  •    

    Structured Data Meets the Web: A Few Observations

    Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (Luna) Dong, Shawn R. Jeffery, David Ko, Cong Yu

    Data Engineering Bulletin (2006)

  •  

    ULDBs: databases with uncertainty and lineage

    Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom

    Proc. 32nd International Conference on Very Large Databases, VLDB, Seoul, Korea (2006), pp. 953-964

  •  

    PADX: Querying large-scale ad hoc data with XQuery

    Mary Fernandez, Kathleen Fisher, Robert Gruber, Yitzhak Mandelbaum

    Proceedings of PLAN-X 2006: Workshop on Programming Language technologies for XML (2006)

  •   

    Networking proposal for TR2

    Gerhard Wesp

    ISO (2005)