Publication Data
Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits
Abstract: We study contextual multi-armed bandit problems where the
context comes from a metric space and the payoff satisfies a Lipschitz condition with
respect to the metric. Abstractly, a contextual multi-armed bandit problem models a
situation where, in a sequence of independent trials, an online algorithm chooses,
based on a given context (side information), an action from a set of possible actions
so as to maximize the total payoff of the chosen actions. The payoff depends on both
the action chosen and the context. In contrast, context-free multi-armed bandit
problems, a focus of much previous research, model situations where no side information
is available and the payoff depends only on the action chosen. Our problem is motivated
by sponsored web search, where the task is to display ads to a user of an Internet
search engine based on her search query so as to maximize the click-through rate (CTR)
of the ads displayed. We cast this problem as a contextual multi-armed bandit problem
where queries and ads form metric spaces and the payoff function is Lipschitz with
respect to both the metrics. For any $\epsilon > 0$ we present an algorithm with
regret $O(T^{\frac{a+b+1}{a+b+2} + \epsilon})$ where $a,b$ are the covering dimensions
of the query space and the ad space respectively. We prove a lower bound
$\Omega(T^{\frac{\tilde{a}+\tilde{b}+1}{\tilde{a}+\tilde{b}+2} \epsilon})$ for the
regret of any algorithm where $\tilde{a}, \tilde{b}$ are packing dimensions of the
query spaces and the ad space respectively. For finite spaces or convex bounded subsets
of Euclidean spaces, this gives an almost matching upper and lower bound.
