Mastering the game of Stratego with model-free multiagent reinforcement learning

被引:48
|
作者
Perolat, Julien [1 ]
De Vylder, Bart [1 ]
Hennes, Daniel [1 ]
Tarassov, Eugene [1 ]
Strub, Florian [1 ]
de Boer, Vincent [1 ]
Muller, Paul [1 ]
Connor, Jerome T. [1 ]
Burch, Neil [1 ]
Anthony, Thomas [1 ]
McAleer, Stephen [1 ]
Elie, Romuald [1 ]
Cen, Sarah H. [1 ]
Wang, Zhe [1 ]
Gruslys, Audrunas [1 ]
Malysheva, Aleksandra [1 ]
Khan, Mina [1 ]
Ozair, Sherjil [1 ]
Timbers, Finbarr [1 ]
Pohlen, Toby [1 ]
Eccles, Tom [1 ]
Rowland, Mark [1 ]
Lanctot, Marc [1 ]
Lespiau, Jean-Baptiste [1 ]
Piot, Bilal [1 ]
Omidshafiei, Shayegan [1 ]
Lockhart, Edward [1 ]
Sifre, Laurent [1 ]
Beauguerlange, Nathalie [1 ]
Munos, Remi [1 ]
Silver, David [1 ]
Singh, Satinder [1 ]
Hassabis, Demis [1 ]
Tuyls, Karl [1 ]
机构
[1] DeepMind Technol Ltd, London, England
关键词
LEVEL; DYNAMICS; GO;
D O I
10.1126/science.add4679
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing stateof-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.
引用
收藏
页码:990 / +
页数:7
相关论文
共 50 条
  • [1] Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems
    Wang, Hong
    Li, Man
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1482 - 1491
  • [2] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    [J]. MANAGEMENT SCIENCE, 2024,
  • [3] Learning Representations in Model-Free Hierarchical Reinforcement Learning
    Rafati, Jacob
    Noelle, David C.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10009 - 10010
  • [4] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Model-Free Quantum Control with Reinforcement Learning
    Sivak, V. V.
    Eickbusch, A.
    Liu, H.
    Royer, B.
    Tsioutsios, I
    Devoret, M. H.
    [J]. PHYSICAL REVIEW X, 2022, 12 (01)
  • [6] Model-Free Active Exploration in Reinforcement Learning
    Russo, Alessio
    Proutiere, Alexandre
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Online Nonstochastic Model-Free Reinforcement Learning
    Ghai, Udaya
    Gupta, Arushi
    Xia, Wenhan
    Singh, Karan
    Hazan, Elad
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Recovering Robustness in Model-Free Reinforcement Learning
    Venkataraman, Harish K.
    Seiler, Peter J.
    [J]. 2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4210 - 4216
  • [9] Model-Free Reinforcement Learning Algorithms: A Survey
    Calisir, Sinan
    Pehlivanoglu, Meltem Kurt
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [10] Model-Free Event-Triggered Consensus Algorithm for Multiagent Systems Using Reinforcement Learning Method
    Long, Mingkang
    Su, Housheng
    Zeng, Zhigang
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (08): : 5212 - 5221