# Multi-Objective Weighted Sampling

## Abstract

Key value data sets of the form $\{(x,w_x)\}$ where $w_x >0$ are prevalent.
Common queries over such data are {\em segment $f$-statistics} $Q(f,H) = \sum_{x\in
H}f(w_x)$, specified for a segment $H$ of the keys and a function $f$. Different
choices of $f$ correspond to count, sum, moments, cap, and threshold statistics.
When the data set is large, we can compute a smaller sample from which we can
quickly estimate statistics. A weighted sample of keys taken with respect to
$f(w_x)$ provides estimates with statistically guaranteed quality for
$f$-statistics. Such a sample $S^{(f)}$ can be used to estimate $g$-statistics for
$g\not=f$, but quality degrades with the disparity between $g$ and $f$. In this
paper we address applications that require quality estimates for a set $F$ of
different functions. A naive solution is to compute and work with a different
sample $S^{(f)}$ for each $f\in F$. Instead, this can be achieved more effectively
and seamlessly using a single {\em multi-objective} sample $S^{(F)}$ of a much
smaller size. We review multi-objective sampling schemes and place them in our
context of estimating $f$-statistics. We show that a multi-objective sample for $F$
provides quality estimates for any $f$ that is a positive linear combination of
functions from $F$. We then establish a surprising and powerful result when the
target set $M$ is {\em all} monotone non-decreasing functions, noting that $M$
includes most natural statistics. We provide efficient multi-objective sampling
algorithms for $M$ and show that a sample size of $k \ln n$ (where $n$ is the
number of active keys) provides the same estimation quality, for any $f\in M$, as a
dedicated weighted sample of size $k$ for $f$.