A reinforcement learning-based strategy updating model for the cooperative evolution

被引：4

作者：

Wang, Xianjia ^{[1
,2
,3
,4
]}

Yang, Zhipeng ^{[1
,2
]}

Liu, Yanli ^{[1
,2
]}

Chen, Guici ^{[1
,2
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Sci, Wuhan 430065, Hubei, Peoples R China

[2] Wuhan Univ Sci & Technol, Hubei Prov Key Lab Syst Sci Met Proc, Wuhan 430065, Hubei, Peoples R China

[3] Wuhan Univ, Econ & Management Sch, Wuhan 430072, Hubei, Peoples R China

[4] Wuhan Univ, Inst Syst Engn, Wuhan 430072, Hubei, Peoples R China

来源：

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS | 2023年 / 618卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Evolutionary game; Cooperation; Prisoner's dilemma game; PRISONERS-DILEMMA; EMERGENCE; DYNAMICS; LEVEL; GAME;

D O I：

10.1016/j.physa.2023.128699

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

The emergence of cooperation between competing agents has been commonly studied through evolutionary games, but such cooperation often requires a mechanism or a third party to be activated and kept alive. To investigate how a mechanism affects the evo-lution of cooperation, this paper proposes an innovative reinforcement learning-based strategy updating model. The model consists of two symmetrical sets of convolutional neural networks. Besides, the agents' strategies updating rules are defined: firstly, the agents learn and predict the environment and the behaviors of neighboring agents, then estimate their future payoffs based on this information, and finally determine their strategies based on these estimated payoffs. Through investigating the behavior characteristics and the stable states of the network for highly intelligent agents with memory learning and prediction ability in the evolution of the prisoner's dilemma game, the results demonstrate that the game initiators who adopt the mixed optimal payoff approach can increase the number of cooperators and facilitate "global cooperation"and "repaying kindness with kindness". Although the temptation factor has little effect on the population, increasing the discount factor can expand the scale of the cooperative cluster and even achieve dynamic stability. Additionally, a smaller size of minibatch is beneficial for the evolution of cooperation in a smaller experience replay pool. A larger size of minibatch is more conducive to the evolution of cooperation with an increasing capacity of the experience replay pool. This research provides a novel perspective from reinforcement learning to understand the evolution of cooperation.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：16

共 50 条

[1] A reinforcement learning-based transformed inverse model strategy for nonlinear process control
Dutta, Debaprasad
Upreti, Simant R.
[J]. COMPUTERS & CHEMICAL ENGINEERING, 2023, 178
[2] Reinforcement Learning-Based Differential Evolution With Cooperative Coevolution for a Compensatory Neuro-Fuzzy Controller
Chen, Cheng-Hung
Liu, Chong-Bin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4719 - 4729
[3] Deep Reinforcement Learning-Based Intelligent Reflecting Surface for Cooperative Jamming Model Design
Lu, Shaofang
Shen, Xianhao
Zhang, Panfeng
Wu, Zhen
Chen, Yi
Wang, Li
Xie, Xiaolan
[J]. IEEE ACCESS, 2023, 11 : 98764 - 98775
[4] Deep Reinforcement Learning-Based Defense Strategy Selection
Charpentier, Axel
Boulahia-Cuppens, Nora
Cuppens, Frederic
Yaich, Reda
[J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
[5] Reinforcement Learning-Based Cooperative Adversarial Algorithm for UAV Cluster
Li, Yan
Gao, Yanlong
Dai, Xunhua
Nian, Xiaohong
Wang, Haibo
Xiong, HongYun
[J]. PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 1129 - 1138
[6] Distributed Highway Control: A Cooperative Reinforcement Learning-Based Approach
Kovari, Balint
Knab, Istvan Gellert
Esztergar-Kiss, Domokos
Aradi, Szilard
Becsi, Tamas
[J]. IEEE ACCESS, 2024, 12 : 104463 - 104472
[7] Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model
Guo Qi
Wu Bo-ying
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2009, 6 (02): : 91 - 96
[8] Reinforcement Learning-Based Electric Vehicles Energy Management Strategy with Battery Thermal Model
黄淦
曹童杰
韩俊华
赵萍
张光林
[J]. Journal of Donghua University(English Edition), 2023, 40 (01) : 80 - 87
[9] Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model
Gao, Weinan
Mynuddin, Mohammed
Wunsch, Donald C.
Jiang, Zhong-Ping
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5229 - 5240
[10] Reinforcement learning-based scheduling strategy for energy storage in microgrid
Zhou, Kunshu
Zhou, Kaile
Yang, Shanlin
[J]. JOURNAL OF ENERGY STORAGE, 2022, 51

← 1 2 3 4 5 →