Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes
Venue
Google Research (2016), 39pp. (to appear)
Publication Year
2016
Authors
Craig Boutilier, Tyler Lu
BibTeX
Abstract
We consider the problem of budget (or other resource) allocation in sequential
decision problems involving a large number of concurrently running sub-processes,
whose only interaction is through their gradual consumption of budget (or the
resource in question). We use the case of an advertiser interacting with a large
population of target customers as a primary motivation. We consider two
complementary problems that comprise our key contributions. Our first contribution
addresses the problem of computing MDP value functions as a function of the
available budget. In contrast to standard constrained MDPs—which find optimal value
functions given a fixed expected budget constraint—our aim is to assess the
tradeoff between expected budget spent and the value attained when using that
budget optimally. We show that optimal value functions are concave in budget. More
importantly, in the finite-horizon case, we show there are a finite number of
useful budget levels. This gives rise to piecewise-linear, concave value functions
(piecewise-constant if we restrict to deterministic policies) with an
representation that can be computed readily via dynamic programming. This
representation also supports natural approximations. Our model not only allows the
assessment of budget/value tradeoffs (e.g., to find the “sweet spot” in spend), but
plays an important role in the allocation of budget across competing subprocesses.
Our second contribution is a method for constructing a policy that prescribes the
joint policy to be taken across all sub-processes given the joint state of the
system, subject to a global budget constraint. We cast the problem as a weakly
coupled MDP in which budget is allocated online to the individual subprocesses
based on its observed (initial) state and the subprocess-specific value function.
We show that the budget allocation process can be cast as a multi-item,
multiple-choice knapsack problem (MCKP), which admits an efficient greedy algorithm
to determine optimal allocations. We also discuss the possibility of online,
per-stage re-allocation of budget to adaptively satisfy strict rather than expected
budget constraints.
