Rvσ(T)a Unifying Approach to Performance and Convergence in Online Multiagent Learning
We present a new multiagent learning algorithm (RVσ(t)) that can guarantee both no-regret performance (all games) and policy convergence (some games of arbitrary size). Unlike its predecessor ReDVaLeR, it (1) does not need to distinguish whether its opponents are self-play or otherwise non-stationary, (2) is allowed to know its portion of any equilibrium that, we argue, leads to convergence in some games in addition to no-regret. Although the regret of RVσ(t) is analyzed in continuous time, we show that it grows slower than in other no-regret techniques like GIGA and GIGA-WoLF. We show that RVσ(t) can converge to coordinated behavior in coordination games, while GIGA, GIGA-WoLF may converge to poorly coordinated (mixed) behaviors.
MSU Digital Commons Citation
Banerjee, Bikramjit and Peng, Jing, "Rvσ(T)a Unifying Approach to Performance and Convergence in Online Multiagent Learning" (2006). Department of Computer Science Faculty Scholarship and Creative Works. 521.