Value functions for depth-limited solving in zero-sum imperfect-information games

被引：3

作者：

Kovarik, Vojtech ^{[1
]}

Seitz, Dominik ^{[1
]}

Lisy, Viliam ^{[1
]}

Rudolf, Jan ^{[1
]}

Sun, Shuo ^{[1
]}

Ha, Karel ^{[1
]}

机构：

[1] Czech Tech Univ, Artificial Intelligence Ctr, FEE, Prague, Czech Republic

来源：

ARTIFICIAL INTELLIGENCE | 2023年 / 314卷

关键词：

Imperfect information game; Multiagent reinforcement learning; Extensive form game; Partially observable stochastic game; Depth limited game; Depth limited solving; Value function; Counterfactual regret minimization;

D O I：

10.1016/j.artint.2022.103805

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We provide a formal definition of depth-limited games together with an accessible and rigorous explanation of the underlying concepts, both of which were previously miss-ing in imperfect-information games. The definition works for an arbitrary (perfect recall) extensive-form game and is not tied to any specific game-solving algorithm. Moreover, this framework unifies and significantly extends three approaches to depth-limited solving that previously existed in extensive-form games and multiagent reinforcement learning but were not known to be compatible. A key ingredient of these depth-limited games is value functions. Focusing on two-player zero-sum imperfect-information games, we show how to obtain optimal value functions and prove that public information provides both necessary and sufficient context for computing them. We provide a domain-independent encoding of the domains that allows for approximating value functions even by simple feed-forward neural networks, which are then able to generalize to unseen parts of the game. We use the resulting value network to implement a depth-limited version of counterfactual re-gret minimization. In three distinct domains, we show that the algorithm's exploitability is roughly linearly dependent on the value network's quality and that it is not difficult to train a value network with which depth-limited CFR's performance is as good as that of CFR with access to the full game.(c) 2022 Published by Elsevier B.V.

引用

页数：51

共 50 条

[1] Depth-Limited Solving for Imperfect-Information Games
Brown, Noam
Sandholm, Tuomas
Amos, Brandon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[2] Asymmetric co-evolution for imperfect-information zero-sum games
Halck, OM
Dahl, FA
MACHINE LEARNING: ECML 2000, 2000, 1810 : 171 - 182
[3] Solving imperfect-information games
Sandholm, Tuomas
SCIENCE, 2015, 347 (6218) : 122 - 123
[4] Limited Lookahead in Imperfect-Information Games
Kroer, Christian
Sandholm, Tuomas
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 575 - 581
[5] Limited lookahead in imperfect-information games
Kroer, Christian
Sandholm, Tuomas
ARTIFICIAL INTELLIGENCE, 2020, 283
[6] Endgame Solving in Large Imperfect-Information Games
Ganzfried, Sam
Sandholm, Tuomas
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 37 - 45
[7] Uniform continuity of the value of zero-sum games with differential information
Einy, Ezra
Haimanko, Ori
Moreno, Diego
Shitovitz, Benyamin
MATHEMATICS OF OPERATIONS RESEARCH, 2008, 33 (03) : 552 - 560
[8] Safe and Nested Subgame Solving for Imperfect-Information Games
Brown, Noam
Sandholm, Tuomas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[9] Iterative Algorithm for Solving Two-player Zero-sum Extensive-form Games with Imperfect Information
Bosansky, Branislav
Kiekintveld, Christopher
Lisy, Viliam
Pechoucek, Michal
20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 193 - +
[10] CONTINUITY PROPERTIES OF VALUE FUNCTIONS IN INFORMATION STRUCTURES FOR ZERO-SUM AND GENERAL GAMES AND STOCHASTIC TEAMS*
Hogeboom-Burr, Ian
Yuksel, Serdar
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (02) : 398 - 414

← 1 2 3 4 5 →