We provide a comparative study of several widely used off-policy estimators
(Empirical Average, Basic Importance Sampling and Normalized Importance Sampling),
detailing the different regimes where they are individually suboptimal. We then
exhibit properties optimal estimators should possess. In the case where examples
have been gathered using multiple policies, we show that fused estimators dominate
basic ones but can still be improved.