Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits
Venue
Thirteenth International Conference on Artificial Intelligence and Statistics, Journal of Machine Learning Research (2010)
Publication Year
2010
Authors
Tyler Lu, Dávid Pál, Martin Pál
BibTeX
Abstract
We study contextual multi-armed bandit problems where the context comes from a
metric space and the payoff satisfies a Lipschitz condition with respect to the
metric. Abstractly, a contextual multi-armed bandit problem models a situation
where, in a sequence of independent trials, an online algorithm chooses, based on a
given context (side information), an action from a set of possible actions so as to
maximize the total payoff of the chosen actions. The payoff depends on both the
action chosen and the context. In contrast, context-free multi-armed bandit
problems, a focus of much previous research, model situations where no side
information is available and the payoff depends only on the action chosen. Our
problem is motivated by sponsored web search, where the task is to display ads to a
user of an Internet search engine based on her search query so as to maximize the
click-through rate (CTR) of the ads displayed. We cast this problem as a contextual
multi-armed bandit problem where queries and ads form metric spaces and the payoff
function is Lipschitz with respect to both the metrics. For any $\epsilon > 0$
we present an algorithm with regret $O(T^{\frac{a+b+1}{a+b+2} + \epsilon})$ where
$a,b$ are the covering dimensions of the query space and the ad space respectively.
We prove a lower bound
$\Omega(T^{\frac{\tilde{a}+\tilde{b}+1}{\tilde{a}+\tilde{b}+2} \epsilon})$ for the
regret of any algorithm where $\tilde{a}, \tilde{b}$ are packing dimensions of the
query spaces and the ad space respectively. For finite spaces or convex bounded
subsets of Euclidean spaces, this gives an almost matching upper and lower bound.
