Learning self-play agents for combinatorial optimization problems

被引：6

作者：

Xu, Ruiyang ^{[1
]}

Lieberherr, Karl ^{[1
]}

机构：

[1] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA 02115 USA

来源：

KNOWLEDGE ENGINEERING REVIEW | 2020年 / 35卷

关键词：

GAME; GO;

D O I：

10.1017/S026988892000020X

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka's Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

引用

页数：18

共 50 条

[31] A Generalized Framework for Self-Play Training
Hernandez, Daniel
Denamganai, Kevin
Gao, Yuan
York, Peter
Devlin, Sam
Samothrakis, Spyridon
Walker, James Alfred
2019 IEEE CONFERENCE ON GAMES (COG), 2019,
[32] An approach to solving combinatorial optimization problems using a population of reinforcement learning agents
Miagkikh, VV
Punch, WF
GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 1999, : 1358 - 1365
[33] Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play
van der Ree, Michiel
Wiering, Marco
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 108 - 115
[34] A Proposal of Score Distribution Predictive Model in Self-Play Deep Reinforcement Learning
Kagoshima, Kazuya
Sakaji, Hiroki
Noda, Itsuki
Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (05)
[35] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
Soemers, Dennis J. N. J.
Piette, Eric
Stephenson, Matthew
Browne, Cameron
2019 IEEE CONFERENCE ON GAMES (COG), 2019,
[36] Self-play learning strategies for resource assignment in Open-RAN networks
Wang, Xiaoyang
Thomas, Jonathan D.
Piechocki, Robert J.
Kapoor, Shipra
Santos-Rodriguez, Raul
Parekh, Arjun
COMPUTER NETWORKS, 2022, 206
[37] Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System
Leung, Chin-Wing
Hu, Shuyue
Leung, Ho-Fung
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9234 - 9241
[38] Advancing Air Combat Tactics with Improved Neural Fictitious Self-play Reinforcement Learning
He, Shaoqin
Gao, Yang
Zhang, Baofeng
Chang, Hui
Zhang, Xinchen
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 653 - 666
[39] Mastering table tennis with hierarchy: a reinforcement learning approach with progressive self-play training
Ma, Hongxu
Fan, Jianyin
Xu, Haoran
Wang, Qiang
APPLIED INTELLIGENCE, 2025, 55 (06)
[40] A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
Silver, David
Hubert, Thomas
Schrittwieser, Julian
Antonoglou, Ioannis
Lai, Matthew
Guez, Arthur
Lanctot, Marc
Sifre, Laurent
Kumaran, Dharshan
Graepel, Thore
Lillicrap, Timothy
Simonyan, Karen
Hassabis, Demis
SCIENCE, 2018, 362 (6419) : 1140 - +

← 1 2 3 4 5 →