On-policy concurrent reinforcement learning

被引:5
|
作者
Banerjee, B [1 ]
Sen, S
Peng, J
机构
[1] Tulane Univ, Dept EECS, New Orleans, LA 70118 USA
[2] Univ Tulsa, MCS Dept, Tulsa, OK 74104 USA
关键词
on-policy reinforcement learning; multi-agent learning; game theory;
D O I
10.1080/09528130412331297956
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such algorithms, with the promise of scalability and efficiency. In particular, it is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions. It is also shown, experimentally, that the new techniques can learn (from self-play) better policies than the previous algorithms (also in self-play) during some phases of the exploration.
引用
收藏
页码:245 / 260
页数:16
相关论文
共 50 条
  • [1] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Saeed Rahimi Gorji
    Ole-Christoffer Granmo
    [J]. Applied Intelligence, 2023, 53 : 8596 - 8613
  • [2] Tabu search exploration for on-policy reinforcement learning
    Abramson, M
    Wechsler, H
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2910 - 2915
  • [3] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Gorji, Saeed Rahimi
    Granmo, Ole-Christoffer
    [J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8596 - 8613
  • [4] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
    Zhang, Yiming
    Ross, Keith W.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Offline Reinforcement Learning with On-Policy Q-Function Regularization
    Shi, Laixi
    Dadashi, Robert
    Chi, Yuejie
    Castro, Pablo Samuel
    Geist, Matthieu
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
  • [6] Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
    Westenbroek, Tyler
    Mazumdar, Eric
    Fridovich-Keil, David
    Prabhu, Valmik
    Tomlin, Claire J.
    Sastry, S. Shankar
    [J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 118 - 125
  • [7] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning
    Ahmed, Ibrahim
    Quinones-Grueiro, Marcos
    Biswas, Gautam
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 13733 - 13738
  • [9] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
    Wiering, Marco A.
    van Hasselt, Hado
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +
  • [10] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
    Gurumurthy, Swaminathan
    Manchester, Zachary
    Kolter, J. Zico
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211