Document Type

Article

Publication Date

12-1-2007

Abstract

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and "other" (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents-a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.

DOI

10.1007/s10458-007-9013-x

Montclair State University Digital Commons Citation

Banerjee, Bikramjit and Peng, Jing, "Generalized Multiagent Learning with Performance Bound" (2007). Department of Computer Science Faculty Scholarship and Creative Works. 302.
https://digitalcommons.montclair.edu/compusci-facpubs/302

Published Citation

Banerjee, B., & Peng, J. (2007). Generalized multiagent learning with performance bound. Autonomous Agents and Multi-Agent Systems, 15(3), 281-312.

Download

Included in

Computer Sciences Commons

COinS

Department of Computer Science Faculty Scholarship and Creative Works

Generalized Multiagent Learning with Performance Bound

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Published Citation

Included in

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

Generalized Multiagent Learning with Performance Bound

Authors

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Published Citation

Included in

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>