Publication Data
Taming Hardware Event Samples for FDO Compilation
Abstract: Feedback-directed optimization (FDO) is effective in
improving application runtime performance, but has not been widely adopted due to the
tedious dual-compilation model, the difficulties in generating representative training
data sets, and the high runtime overhead of profile collection. The use of
hardware-event sampling to generate estimated edge profiles overcomes these drawbacks.
Yet, hardware event samples are typically not precise at the instruction or basic-block
granularity. These inaccuracies lead to missed performance when compared to
instrumentation-based FDO. In this paper, we use multiple hardware event profiles and
supervised learning techniques to generate heuristics for improved precision of
basic-block-level sample profiles, and to further improve the smoothing algorithms used
to construct edge profiles. We demonstrate that sampling-based FDO can achieve an
average of 78% of the performance gains obtained using instrumentation-based exact edge
profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many
cases. The overhead of collection is only 0.74% on average, while compiler based
instrumentation incurs 6.8%–53.5% overhead (and 10x overhead on an industrial web
search application), and dynamic instrumentation incurs 28.6%–1639.2% overhead.
