Proximal Learning With Opponent-Learning Awareness

被引:0
|
作者
Zhao, Stephen [1 ,2 ]
Lu, Chris [3 ]
Grosse, Roger [1 ,2 ]
Foerster, Jakob [3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] Univ Oxford, FLAIR, Oxford, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning With Opponent-Learning Awareness (LOLA) (Foerster et al. [2018a]) is a multi-agent reinforcement learning algorithm that typically learns reciprocity-based cooperation in partially competitive environments. However, LOLA often fails to learn such behaviour on more complex policy spaces parameterized by neural networks, partly because the update rule is sensitive to the policy parameterization. This problem is especially pronounced in the opponent modeling setting, where the opponent's policy is unknown and must be inferred from observations; in such settings, LOLA is ill-specified because behaviourally equivalent opponent policies can result in non-equivalent updates. To address this shortcoming, we reinterpret LOLA as approximating a proximal operator, and then derive a new algorithm, proximal LOLA (POLA), which uses the proximal formulation directly. Unlike LOLA, the POLA updates are parameterization invariant, in the sense that when the proximal objective has a unique optimum, behaviourally equivalent policies result in behaviourally equivalent updates. We then present practical approximations to the ideal POLA update, which we evaluate in several partially competitive environments with function approximation and opponent modeling. This empirically demonstrates that POLA achieves reciprocity-based cooperation more reliably than LOLA.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Learning with Opponent-Learning Awareness
    Foerster, Jakob
    Chen, Richard Y.
    Al-Shedivat, Maruan
    Whiteson, Shimon
    Abbeel, Pieter
    Mordatch, Igor
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 122 - 130
  • [2] COLA: Consistent Learning with Opponent-Learning Awareness
    Willi, Timon
    Letcher, Alistair
    Treutlein, Johannes
    Foerster, Jakob
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Opponent learning awareness and modelling in multi-objective normal form games
    Roxana Rădulescu
    Timothy Verstraeten
    Yijie Zhang
    Patrick Mannion
    Diederik M. Roijers
    Ann Nowé
    Neural Computing and Applications, 2022, 34 : 1759 - 1781
  • [4] Opponent learning awareness and modelling in multi-objective normal form games
    Radulescu, Roxana
    Verstraeten, Timothy
    Zhang, Yijie
    Mannion, Patrick
    Roijers, Diederik M.
    Nowe, Ann
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (03): : 1759 - 1781
  • [5] Learning to Model Opponent Learning (Student Abstract)
    Davies, Ian
    Tian, Zheng
    Wang, Jun
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13771 - 13772
  • [6] LEARNING WITHOUT AWARENESS AND AWARENESS WITHOUT LEARNING
    WOLPIN, M
    MILGRAM, N
    PSYCHOLOGICAL REPORTS, 1962, 10 (03) : 867 - 874
  • [7] Simulation Coupled Learning for a Robotic Opponent
    Reid, James
    PROCEEDINGS OF THE 48TH ANNUAL SOUTHEAST REGIONAL CONFERENCE (ACM SE 10), 2010, : 267 - 270
  • [8] Opponent Modeling in Deep Reinforcement Learning
    He, He
    Boyd-Graber, Jordan
    Kwok, Kevin
    Daume, Hal, III
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [9] Awareness and learning
    Dawson, ME
    Clark, RE
    Hamm, AO
    Öhman, A
    Wiens, S
    PSYCHOPHYSIOLOGY, 2003, 40 : S6 - S6
  • [10] Opponent Model Selection Using Deep Learning
    Chang, Hung-Jui
    Yueh, Cheng
    Fan, Gang-Yu
    Lin, Ting-Yu
    Hsu, Tsan-sheng
    ADVANCES IN COMPUTER GAMES, ACG 2021, 2022, 13262 : 176 - 186