ESOFTCHECK: REMOVAL OF NON-VITAL CHECKS FOR FAULT TOLERANCE
Venue
Proceedings of the CGO 2009, The Seventh International Symposium on Code Generation and Optimization, IEEE Computer Society, pp. 35-46
Publication Year
2009
Authors
Jing Yu, Maria Jesus Garzaran, Marc Snir
BibTeX
Abstract
As semiconductor technology scales into the deep submicron regime the occurrence of
transient or soft errors will increase. This will require new approaches to error
detection. Software checking approaches are attractive because they require little
hardware modification and can be easily adjusted to fit different reliability and
performance requirements. Unfortunately, software checking adds a significant
performance overhead.In this paper we present ESoftCheck, a set of compiler
optimization techniques to determine which are the vital checks, that is, the
minimum number of checks that are necessary to detect an error and roll back to a
correct program state. ESoftCheck identifies the vital checks on platforms where
registers are hardware-protected with parity or ECC, when there are redundant
checks and when checks appear in loops. ESoftCheck also provides knobs to trade
reliability for performance based on the support for recovery and the degree of
trustiness of the operations. Our experimental results on a Pentium 4 show that
ESoftCheck can obtain 27.1% performance improvement without losing fault coverage.
