Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning

被引：2

作者：

Shi, Daming ^{[1
]}

Guo, Xudong ^{[1
]}

Liu, Yi ^{[1
]}

Fan, Wenhui ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

ENTROPY | 2022年 / 24卷 / 06期

关键词：

multi-agent; reinforcement learning; Actor-Critic; poker; multi-player; optimal policy; GAME; GO;

D O I：

10.3390/e24060774

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold 'em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.

引用

页数：18

共 50 条

[1] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
Grondman, Ivo
Busoniu, Lucian
Lopes, Gabriel A. D.
Babuska, Robert
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1291 - 1307
[2] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[3] Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Zhu, Hanlin
Rashidinejad, Paria
Jiao, Jiantao
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[5] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
[7] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[8] A modified actor-critic reinforcement learning algorithm
Mustapha, SM
Lachiver, G
[J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609
[9] A fuzzy Actor-Critic reinforcement learning network
Wang, Xue-Song
Cheng, Yu-Hu
Yi, Jian-Qiang
[J]. INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
[10] Research on actor-critic reinforcement learning in RoboCup
Guo, He
Liu, Tianying
Wang, Yuxin
Chen, Feng
Fan, Jianming
[J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205

← 1 2 3 4 5 →