Department of Computer Science Faculty Scholarship and Creative Works

Efficient Learning of Multi-Step Best Response

Bikramjit Banerjee, Tulane UniversityFollow
Jing Peng, Montclair State UniversityFollow

Document Type

Paper

Publication Date

12-1-2005

Abstract

We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.

Montclair State University Digital Commons Citation

Banerjee, Bikramjit and Peng, Jing, "Efficient Learning of Multi-Step Best Response" (2005). Department of Computer Science Faculty Scholarship and Creative Works. 245.
https://digitalcommons.montclair.edu/compusci-facpubs/245

This document is currently not available here.

COinS

Department of Computer Science Faculty Scholarship and Creative Works

Efficient Learning of Multi-Step Best Response

Document Type

Publication Date

Abstract

Montclair State University Digital Commons Citation

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

Efficient Learning of Multi-Step Best Response

Authors

Document Type

Publication Date

Abstract

Montclair State University Digital Commons Citation

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>