Visualizing Statistical Mix Effects and Simpson's Paradox
Venue
Proceedings of IEEE InfoVis 2014, IEEE (to appear)
Publication Year
2014
Authors
Zan Armstrong, Martin Wattenberg
BibTeX
Abstract
We discuss how “mix effects” can surprise users of visualizations and potentially
lead them to incorrect conclusions. This statistical issue (also known as “omitted
variable bias” or, in extreme cases, as “Simpson’s paradox”) is widespread and can
affect any visualization in which the quantity of interest is an aggregated value
such as a weighted sum or average. Our first contribution is to document how mix
effects can be a serious issue for visualizations, and we analyze how mix effects
can cause problems in a variety of popular visualization techniques, from bar
charts to treemaps. Our second contribution is a new technique, the “comet chart,”
that is meant to ameliorate some of these issues.
