Learning self-play agents for combinatorial optimization problems

被引:6
|
作者
Xu, Ruiyang [1 ]
Lieberherr, Karl [1 ]
机构
[1] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA 02115 USA
来源
关键词
GAME; GO;
D O I
10.1017/S026988892000020X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka's Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] A Generalized Framework for Self-Play Training
    Hernandez, Daniel
    Denamganai, Kevin
    Gao, Yuan
    York, Peter
    Devlin, Sam
    Samothrakis, Spyridon
    Walker, James Alfred
    2019 IEEE CONFERENCE ON GAMES (COG), 2019,
  • [32] An approach to solving combinatorial optimization problems using a population of reinforcement learning agents
    Miagkikh, VV
    Punch, WF
    GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 1999, : 1358 - 1365
  • [33] Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play
    van der Ree, Michiel
    Wiering, Marco
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 108 - 115
  • [34] A Proposal of Score Distribution Predictive Model in Self-Play Deep Reinforcement Learning
    Kagoshima, Kazuya
    Sakaji, Hiroki
    Noda, Itsuki
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (05)
  • [35] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    2019 IEEE CONFERENCE ON GAMES (COG), 2019,
  • [36] Self-play learning strategies for resource assignment in Open-RAN networks
    Wang, Xiaoyang
    Thomas, Jonathan D.
    Piechocki, Robert J.
    Kapoor, Shipra
    Santos-Rodriguez, Raul
    Parekh, Arjun
    COMPUTER NETWORKS, 2022, 206
  • [37] Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System
    Leung, Chin-Wing
    Hu, Shuyue
    Leung, Ho-Fung
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9234 - 9241
  • [38] Advancing Air Combat Tactics with Improved Neural Fictitious Self-play Reinforcement Learning
    He, Shaoqin
    Gao, Yang
    Zhang, Baofeng
    Chang, Hui
    Zhang, Xinchen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 653 - 666
  • [39] Mastering table tennis with hierarchy: a reinforcement learning approach with progressive self-play training
    Ma, Hongxu
    Fan, Jianyin
    Xu, Haoran
    Wang, Qiang
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [40] A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
    Silver, David
    Hubert, Thomas
    Schrittwieser, Julian
    Antonoglou, Ioannis
    Lai, Matthew
    Guez, Arthur
    Lanctot, Marc
    Sifre, Laurent
    Kumaran, Dharshan
    Graepel, Thore
    Lillicrap, Timothy
    Simonyan, Karen
    Hassabis, Demis
    SCIENCE, 2018, 362 (6419) : 1140 - +