Trading off Accuracy for Speed in PowerDrill

Alex Hall

Alexandru Tudorica

Filip Buruiana

Reimar Hofmann

Silviu-Ionut Ganceanu

Thomas Hofmann

ICDE 2016: International Conference on Data Engineering, Chicago, USA, (Sep 19-20, 2016), World Academy of Science, Engineering and Technology, pp. 2121-2132

Download Google Scholar

Abstract

In-memory column-stores are becoming more and more widespread, making interactive data analysis feasible for ever growing amounts of data. However, data volume in many big data scenarios still exceeds the available memory and thus requires smart algorithms if interactive response times are desired. PowerDrill [13] is such a system which is used Google-internally for data discovery in log data. In this paper we present results from the “fast approximations” initiative, where we explore possibilities for optimizing performance at the expense of accuracy. In one of the optimizations we use some of the theory behind Data-Sketches to considerably improve memory efficiency of the data encoding scheme. For particularly expensive fields in our largest datasets this saves a factor of 4.5 when compared to a standard production grade compression algorithm. We additionally evaluate the effects of using sampling on accuracy and performance. We propose a simple heuristic for estimating the accuracy of individual result-values. We show that annotating result-values as accurate (or not) is essential for the usefulness of intermediate results over sub-samples. Only with these annotations our users can start interpreting intermediate results before final results are available. For a large set of queries this effectively brings down the 95th latency percentile from 30 to 4 seconds.

Research Areas

Algorithms and Theory

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Trading off Accuracy for Speed in PowerDrill

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Trading off Accuracy for Speed in PowerDrill

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities