Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引:2
|
作者
Kazemi, Yosra [1 ]
Chanel, Caroline P. C. [2 ]
Givigi, Sidney [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada
[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France
关键词
~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;
D O I
10.1109/TCSS.2023.3289470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.
引用
收藏
页码:2523 / 2534
页数:12
相关论文
共 50 条
  • [1] Multiagent reinforcement learning in the Iterated Prisoner's Dilemma
    Sandholm, TW
    Crites, RH
    BIOSYSTEMS, 1996, 37 (1-2) : 147 - 166
  • [2] Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma
    Harper, Marc
    Knight, Vincent
    Jones, Martin
    Koutsovoulos, Georgios
    Glynatsi, Nikoleta E.
    Campbell, Owen
    PLOS ONE, 2017, 12 (12):
  • [3] Augmenting Reinforcement Learning to Enhance Cooperation in the Iterated Prisoner's Dilemma
    Feehan, Grace
    Fatima, Shaheen
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 146 - 157
  • [4] Multiagent Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma
    Vassiliades, Vassilis
    Cleanthous, Aristodemos
    Christodoulou, Chris
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (04): : 639 - 653
  • [5] Reinforcement learning in a prisoner's dilemma
    Dolgopolov, Arthur
    GAMES AND ECONOMIC BEHAVIOR, 2024, 144 : 84 - 103
  • [6] Evolution and Incremental Learning in the Iterated Prisoner's Dilemma
    Quek, Han-Yang
    Tan, Kay Chen
    Goh, Chi-Keong
    Abbass, Hussein A.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2009, 13 (02) : 303 - 320
  • [7] Heterogeneous Strategy Learning in the Iterated Prisoner's Dilemma
    Rangoni, Ruggero
    ETICA & POLITICA, 2013, 15 (02): : 42 - 57
  • [8] Learning versus evolution in iterated prisoner's dilemma
    Hingston, P
    Kendall, G
    CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 364 - 372
  • [9] Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma
    Masuda, Naoki
    Nakamura, Mitsuhiro
    JOURNAL OF THEORETICAL BIOLOGY, 2011, 278 (01) : 55 - 62
  • [10] Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma
    Vassiliades, Vassilis
    Cleanthous, Aristodemos
    Christodoulou, Chris
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 737 - 746