Machine learning models are often used at test-time subject to constraints and
trade-offs not present at training-time. For example, a computer vision model
operating on an embedded device may need to perform real-time inference, or a
translation model operating on a cell phone may wish to bound its average compute
time in order to be power-efficient. In this work we describe a mixture-of-experts
model and show how to change its test-time resource-usage on a per-input basis
using reinforcement learning. We test our method on a small MNIST-based example.