# Multi-Armed Bandits with Metric Movement Costs

### Venue

NIPS (2017) (to appear)

### Publication Year

2017

### Authors

Roi Livni, Tomer Koren, Yishay Mansour

### BibTeX

## Abstract

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there
is a fixed and known metric on the action space that determines a cost for
switching between any pair of actions. The loss of the online learner has two
components: the first is the usual loss of the selected actions, and the second is
an additional loss due to switching between actions. Our main contribution gives a
tight characterization of the expected minimax regret in this setting, in terms of
a complexity measure~$C$ of the underlying metric which depends on its covering
numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm
that achieves regret of the form $\tO(\max\set{C^{1/3}T^{2/3},\sqrt{kT}})$, and
show that this is the best possible. Our regret bound generalizes previous known
regret bounds for some special cases: (i) the unit-switching cost regret
$\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $C=\Theta(k)$, and
(ii) the interval metric with regret
$\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $C=\Theta(1)$. For infinite
metrics spaces with Lipschitz loss functions, we derive a tight regret bound of
$\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski
dimension of the space, which is known to be tight even when there are no switching
costs.