What can performance counters do for memory subsystem analysis?
ACM SIGPLAN Workshop on Memory Systems Performance &
Correctness (MSPC'08), ACM, Seattle (2008), pp.
Nowadays, all major processors provide a set of performance counters which capture
micro-architectural level information, such as the number of elapsed cycles, cache
misses, or instructions executed. Counters can be found in processor cores,
processor die, chipsets, or in I/O cards. They can provide a wealth of information
as to how the hardware is being used by software. Many processors now support
events to measure precisely and with very limited overhead, the traffic between a
core and the memory subsystem. It is possible to compute average load latency and
bus bandwidth utilization. This valuable information can be used to improve code
quality and placement of threads to maximize hardware utilization.