Monotonic Model Improvement Self-play Algorithm for Adversarial Games

被引:0
|
作者
Sundar, Poorna Syama [1 ]
Vasam, Manjunath [1 ]
Joseph, Ajin George [1 ]
机构
[1] Indian Inst Technol Tirupati, Dept Comp Sci & Engn, Tirupati, Andhra Pradesh, India
关键词
D O I
10.1109/CDC49753.2023.10383417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of solving strategy games has intrigued the scientific community for centuries. In this paper, we consider two-player adversarial zero-sum symmetric games with zero information loss. Here, both players are continuously attempting to make decisions that will change the current game state to his/her advantage and hence the gains of one player are always equal to the losses of the other player. In this paper, we propose a model improvement self-play algorithm, where the agent iteratively switches roles to subdue the current adversary strategy. This monotonic improvement sequence leads to the ultimate development of a monolithic, competent absolute no-loss policy for the game environment. This tactic is the first of its kind in the setting of two-player adversarial games. Our approach could perform competitively and sometimes expertly in games such as 4x4 tic-tac-toe, 5x5 domineering, cram, and dots & boxes with a minimum number of moves.
引用
收藏
页码:5600 / 5605
页数:6
相关论文
共 50 条
  • [41] Distributed Reinforcement Learning with Self-Play in Parameterized Action Space
    Ma, Jun
    Yao, Shunyi
    Chen, Guangda
    Song, Jiakai
    Ji, Jianmin
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1178 - 1185
  • [42] AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    Vincent Conitzer
    Tuomas Sandholm
    Machine Learning, 2007, 67 : 23 - 43
  • [43] AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
    Conitzer, Vincent
    Sandholm, Tuomas
    MACHINE LEARNING, 2007, 67 (1-2) : 23 - 43
  • [44] Finding Effective Security Strategies through Reinforcement Learning and Self-Play
    Hammar, Kim
    Stadler, Rolf
    2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [45] Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 245 - 252
  • [46] Learning a game strategy using pattern-weights and self-play
    Shapiro, A
    Fuchs, G
    Levinson, R
    COMPUTERS AND GAMES, 2003, 2883 : 42 - 60
  • [47] Fictitious Self-Play for Vehicle-to-Grid Game with Imperfect Information
    Chen, Xiangyu
    Leung, Ka-Cheong
    ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [48] TIYUNTSONG: A SELF-PLAY REINFORCEMENT LEARNING APPROACH FOR ABR VIDEO STREAMING
    Huang, Tianchi
    Yao, Xin
    Wu, Chenglei
    Zhang, Rui-Xiao
    Pang, Zhengyuan
    Sun, Lifeng
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1678 - 1683
  • [49] Follow-up on "Self-play, deep search, and diminishing returns"
    Heinz, EA
    ICGA JOURNAL, 2003, 26 (02) : 75 - 80
  • [50] DeltaDou: Expert-level Doudizhu AI through Self-play
    Jiang, Qiqi
    Li, Kuangzheng
    Du, Boyao
    Chen, Hao
    Fang, Hai
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 1265 - 1271