Taming Hardware Event Samples for FDO Compilation
Venue
Proceedings of International Symposium on Code Generation and Optimization (CGO) (2010)
Publication Year
2010
Authors
Dehao Chen, Neil Vachharajani, Robert Hundt, Shih-wei Liao, Vinodha Ramasamy, Paul Yuan, Wenguang Chen, Weiming Zheng
BibTeX
Abstract
Feedback-directed optimization (FDO) is effective in improving application runtime
performance, but has not been widely adopted due to the tedious dual-compilation
model, the difficulties in generating representative training data sets, and the
high runtime overhead of profile collection. The use of hardware-event sampling to
generate estimated edge profiles overcomes these drawbacks. Yet, hardware event
samples are typically not precise at the instruction or basic-block granularity.
These inaccuracies lead to missed performance when compared to
instrumentation-based FDO. In this paper, we use multiple hardware event profiles
and supervised learning techniques to generate heuristics for improved precision of
basic-block-level sample profiles, and to further improve the smoothing algorithms
used to construct edge profiles. We demonstrate that sampling-based FDO can achieve
an average of 78% of the performance gains obtained using instrumentation-based
exact edge profiles for SPEC2000 benchmarks, matching or beating
instrumentation-based FDO in many cases. The overhead of collection is only 0.74%
on average, while compiler based instrumentation incurs 6.8%–53.5% overhead (and
10x overhead on an industrial web search application), and dynamic instrumentation
incurs 28.6%–1639.2% overhead.
