Learning self-play agents for combinatorial optimization problems

被引:6
|
作者
Xu, Ruiyang [1 ]
Lieberherr, Karl [1 ]
机构
[1] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA 02115 USA
来源
关键词
GAME; GO;
D O I
10.1017/S026988892000020X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka's Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] A Deep Reinforcement Learning Approach Using Asymmetric Self-Play for Robust Multirobot Flocking
    Jia, Yunjie
    Song, Yong
    Cheng, Jiyu
    Jin, Jiong
    Zhang, Wei
    Yang, Simon X.
    Kwong, Sam
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025,
  • [42] Transfer Reinforcement Learning for Combinatorial Optimization Problems
    Souza, Gleice Kelly Barbosa
    Santos, Samara Oliveira Silva
    Ottoni, Andre Luiz Carvalho
    Oliveira, Marcos Santos
    Oliveira, Daniela Carine Ramires
    Nepomuceno, Erivelton Geraldo
    ALGORITHMS, 2024, 17 (02)
  • [43] Fictitious Self-Play in Extensive-Form Games
    Heinrich, Johannes
    Lanctot, Marc
    Silver, David
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 805 - 813
  • [44] Self-Play for Training General Fighting Game AI
    Takano, Yoshina
    Inoue, Hideyasu
    Thawonmas, Ruck
    Harada, Tomohiro
    2019 NICOGRAPH INTERNATIONAL (NICOINT), 2019, : 120 - 120
  • [45] A Comparison of Self-Play Algorithms Under a Generalized Framework
    Hernandez, Daniel
    Denamganai, Kevin
    Devlin, Sam
    Samothrakis, Spyridon
    Walker, James Alfred
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (02) : 221 - 231
  • [46] Temporal Induced Self-Play for Stochastic Bayesian Games
    Chen, Weizhe
    Zhou, Zihan
    Wu, Yi
    Fang, Fei
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 96 - 103
  • [47] Optimization of Wireless Ad Hoc Network Node Layout Self-play Based on AlphaZero Algorithm
    Zou, Xiaofei
    Yang, Ruopeng
    Yin, Changsheng
    Wang, Xuefeng
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 334 - 337
  • [48] Machine Learning Approaches to Learning Heuristics for Combinatorial Optimization Problems
    Mirshekarian, Sadegh
    Sormaz, Dusan
    28TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING (FAIM2018): GLOBAL INTEGRATION OF INTELLIGENT MANUFACTURING AND SMART INDUSTRY FOR GOOD OF HUMANITY, 2018, 17 : 102 - 109
  • [49] Do as you teach: a multi-teacher approach to self-play in deep reinforcement learning
    Chaitanya Kharyal
    Sai Krishna Gottipati
    Tanmay Kumar Sinha
    Fatemeh Abdollahi
    Srijita Das
    Matthew E. Taylor
    Neural Computing and Applications, 2025, 37 (8) : 5945 - 5956
  • [50] Multiagent Reinforcement Learning for Strategic Decision Making and Control in Robotic Soccer Through Self-Play
    Brandao, Bruno
    De Lima, Telma Woerle
    Soares, Anderson
    Melo, Luckeciano
    Maximo, Marcos R. O. A.
    IEEE ACCESS, 2022, 10 : 72628 - 72642