On the Convergence of Regret Minimization Dynamics in Concave Games
41st Annual ACM Symposium on Theory of Computing, STOC, ACM (2009), pp. 523-532
Eyal Even-Dar, Yishay Mansour, Uri Nadav
We consider standard regret minimization setting where at each time step the decision maker has to choose a distribution over k alternatives, and then observes the loss of each alternative. The setting is very similar to the classical online job scheduling setting with three major differences:
- Information model: in the regret minimization setting losses are only observed after the actions (assigning the job to a machine) is performed and not observed before the action selection, as assumed in the classical online job scheduling setting,
- The comparison class: in regret minimization the comparison class is the best static algorithm (i.e., distribution over alternatives) and not the optimal offline solution.
- Performance measure: In regret minimization we measure the additive difference to the optimal solution in the comparison class, in contrast to the ratio used in online job scheduling setting.