Date of Award

5-2017

Document Type

Thesis

Degree Name

Master of Science (MS)

College/School

College of Science and Mathematics

Department/Program

Computer Science

Thesis Sponsor/Dissertation Chair/Project Chair

Jing Peng

Committee Member

Anna Feldman

Committee Member

Dajin Wang

Subject(s)

Machine learning, Intelligent agents (Computer software), Board games

Abstract

Stratego is a two-player, non-stochastic, imperfect-information strategy game in which players try to locate and capture the opponent's flag. At the outset o f each game, players deploy their pieces in any arrangement they choose. Throughout play, each player knows the positions of the opponent’s pieces, but not the specific identities o f the opponent’s pieces. The game therefore involves deduction, bluffing, and a degree o f invention in addition to the sort o f planning familiar to perfect-information games like chess or backgammon.

Developing a strong A.l. player presents three major challenges. Firstly, a Stratego program must maintain states o f belief about the opponent’s pieces as well as beliefs about the opponent’s beliefs. Beliefs must be updated according to in-game events. We propose to solve this using Bayesian probability theory and Bayesian networks.

Secondly, any turn-based game-playing program must perform tree search as part o f its planning and move-making routine. Search in perfect-information games such as chess has been studied extensively and produced a wealth o f algorithms and heuristics to expedite the process. Stochastic and imperfect-information games, however, have received less general attention, though Schaeffer et al have made a significant effort to revisit this domain. Interestingly, the same family o f algorithms (Ballard’s Star-1 and Star-2) used in the stochastic perfect-information game of backgammon can be used in the deterministic, imperfect-information domain o f Stratego. The technical challenge here, just as in the stochastic domain, is to optimize node cutoffs.

Thirdly, a strong Stratego program should have some degree o f inventiveness so that it can avoid predictable play. The game’s intricacy comes from information being concealed from the players. A program which plays too predictably (that is, according to known or obvious tactics) has a significant disadvantage against a more creative opponent. There is a balance, however, between tactics’ being novel and being foolish. Current, strong Stratego programs have been developed by human experts (such as Vincent deBoer), whose tactical preferences are hard-coded into those programs. Since we claim no especial talent for Stratego ourselves, part o f the development challenge will be to allow the program to discover tactical preferences and advantages on its own. Withholding explicitly programmed heuristics and allowing machines to discover tactics on their own has led to original and powerful computer play in the past (note Tesauro’s success with TD-Gammon). We hope our program will likewise learn to play competitively without depending on instruction from a mediocre or predictable player. Various techniques from machine learning, including both supervised and unsupervised learning, are applied to this objective. At our disposal are more than 50,000 match records from an online Stratego site. Part of developing a strong player will involve separating the truly advantageous features in these data from features which are merely frequent. The learning process must be objective enough to avoid bias and predictability, yet robust enough to exploit utility. We introduce a modeling method which allows partial instruction as guidelines for feature detection.

File Format

PDF

Share

COinS