Theoretical Advantages of Lenient Learners in Multiagent Systems
Venue
Proceedings of the Sixth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-07), ACM (2007)
Publication Year
2007
Authors
Liviu Panait, Karl Tuyls
BibTeX
Abstract
This paper presents the dynamics of multiple reinforcement learning agents from an
Evolutionary Game Theoretic perspective. We provide a Replicator Dynamics model for
traditional multiagent Q-learning, and we then extend these differential equations
to account for lenient learners: agents that forgive possible mistakes of their
teammates that resulted in lower rewards. We use this extended formal model to
visualize the basins of attraction of both traditional and lenient multiagent
Q-learners in two benchmark coordination problems. The results indicate that
lenience provides learners with more accurate estimates for the utility of their
actions, resulting in higher likelihood of convergence to the globally optimal
solution. In addition, our research supports the strength of EGT as a backbone for
multiagent reinforcement learning.
