A reinforcement learning-based strategy updating model for the cooperative evolution

被引:4
|
作者
Wang, Xianjia [1 ,2 ,3 ,4 ]
Yang, Zhipeng [1 ,2 ]
Liu, Yanli [1 ,2 ]
Chen, Guici [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Sci, Wuhan 430065, Hubei, Peoples R China
[2] Wuhan Univ Sci & Technol, Hubei Prov Key Lab Syst Sci Met Proc, Wuhan 430065, Hubei, Peoples R China
[3] Wuhan Univ, Econ & Management Sch, Wuhan 430072, Hubei, Peoples R China
[4] Wuhan Univ, Inst Syst Engn, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Evolutionary game; Cooperation; Prisoner's dilemma game; PRISONERS-DILEMMA; EMERGENCE; DYNAMICS; LEVEL; GAME;
D O I
10.1016/j.physa.2023.128699
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The emergence of cooperation between competing agents has been commonly studied through evolutionary games, but such cooperation often requires a mechanism or a third party to be activated and kept alive. To investigate how a mechanism affects the evo-lution of cooperation, this paper proposes an innovative reinforcement learning-based strategy updating model. The model consists of two symmetrical sets of convolutional neural networks. Besides, the agents' strategies updating rules are defined: firstly, the agents learn and predict the environment and the behaviors of neighboring agents, then estimate their future payoffs based on this information, and finally determine their strategies based on these estimated payoffs. Through investigating the behavior characteristics and the stable states of the network for highly intelligent agents with memory learning and prediction ability in the evolution of the prisoner's dilemma game, the results demonstrate that the game initiators who adopt the mixed optimal payoff approach can increase the number of cooperators and facilitate "global cooperation"and "repaying kindness with kindness". Although the temptation factor has little effect on the population, increasing the discount factor can expand the scale of the cooperative cluster and even achieve dynamic stability. Additionally, a smaller size of minibatch is beneficial for the evolution of cooperation in a smaller experience replay pool. A larger size of minibatch is more conducive to the evolution of cooperation with an increasing capacity of the experience replay pool. This research provides a novel perspective from reinforcement learning to understand the evolution of cooperation.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A reinforcement learning-based transformed inverse model strategy for nonlinear process control
    Dutta, Debaprasad
    Upreti, Simant R.
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2023, 178
  • [2] Reinforcement Learning-Based Differential Evolution With Cooperative Coevolution for a Compensatory Neuro-Fuzzy Controller
    Chen, Cheng-Hung
    Liu, Chong-Bin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4719 - 4729
  • [3] Deep Reinforcement Learning-Based Intelligent Reflecting Surface for Cooperative Jamming Model Design
    Lu, Shaofang
    Shen, Xianhao
    Zhang, Panfeng
    Wu, Zhen
    Chen, Yi
    Wang, Li
    Xie, Xiaolan
    [J]. IEEE ACCESS, 2023, 11 : 98764 - 98775
  • [4] Deep Reinforcement Learning-Based Defense Strategy Selection
    Charpentier, Axel
    Boulahia-Cuppens, Nora
    Cuppens, Frederic
    Yaich, Reda
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
  • [5] Reinforcement Learning-Based Cooperative Adversarial Algorithm for UAV Cluster
    Li, Yan
    Gao, Yanlong
    Dai, Xunhua
    Nian, Xiaohong
    Wang, Haibo
    Xiong, HongYun
    [J]. PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 1129 - 1138
  • [6] Distributed Highway Control: A Cooperative Reinforcement Learning-Based Approach
    Kovari, Balint
    Knab, Istvan Gellert
    Esztergar-Kiss, Domokos
    Aradi, Szilard
    Becsi, Tamas
    [J]. IEEE ACCESS, 2024, 12 : 104463 - 104472
  • [7] Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model
    Guo Qi
    Wu Bo-ying
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2009, 6 (02): : 91 - 96
  • [8] Reinforcement Learning-Based Electric Vehicles Energy Management Strategy with Battery Thermal Model
    黄淦
    曹童杰
    韩俊华
    赵萍
    张光林
    [J]. Journal of Donghua University(English Edition), 2023, 40 (01) : 80 - 87
  • [9] Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model
    Gao, Weinan
    Mynuddin, Mohammed
    Wunsch, Donald C.
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5229 - 5240
  • [10] Reinforcement learning-based scheduling strategy for energy storage in microgrid
    Zhou, Kunshu
    Zhou, Kaile
    Yang, Shanlin
    [J]. JOURNAL OF ENERGY STORAGE, 2022, 51