On-policy concurrent reinforcement learning

被引：5

作者：

Banerjee, B ^{[1
]}

Sen, S

Peng, J

机构：

[1] Tulane Univ, Dept EECS, New Orleans, LA 70118 USA

[2] Univ Tulsa, MCS Dept, Tulsa, OK 74104 USA

来源：

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE | 2004年 / 16卷 / 04期

关键词：

on-policy reinforcement learning; multi-agent learning; game theory;

D O I：

10.1080/09528130412331297956

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such algorithms, with the promise of scalability and efficiency. In particular, it is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions. It is also shown, experimentally, that the new techniques can learn (from self-play) better policies than the previous algorithms (also in self-play) during some phases of the exploration.

引用

页码：245 / 260

页数：16

共 50 条

[1] Off-policy and on-policy reinforcement learning with the Tsetlin machine
Saeed Rahimi Gorji
Ole-Christoffer Granmo
[J]. Applied Intelligence, 2023, 53 : 8596 - 8613
[2] Tabu search exploration for on-policy reinforcement learning
Abramson, M
Wechsler, H
[J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2910 - 2915
[3] Off-policy and on-policy reinforcement learning with the Tsetlin machine
Gorji, Saeed Rahimi
Granmo, Ole-Christoffer
[J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8596 - 8613
[4] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Zhang, Yiming
Ross, Keith W.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Offline Reinforcement Learning with On-Policy Q-Function Regularization
Shi, Laixi
Dadashi, Robert
Chi, Yuejie
Castro, Pablo Samuel
Geist, Matthieu
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
[6] Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
Westenbroek, Tyler
Mazumdar, Eric
Fridovich-Keil, David
Prabhu, Valmik
Tomlin, Claire J.
Sastry, S. Shankar
[J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 118 - 125
[7] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Zhong, Rujie
Zhang, Duohan
Schafer, Lukas
Albrecht, Stefano V.
Hanna, Josiah P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning
Ahmed, Ibrahim
Quinones-Grueiro, Marcos
Biswas, Gautam
[J]. IFAC PAPERSONLINE, 2020, 53 (02): : 13733 - 13738
[9] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
Wiering, Marco A.
van Hasselt, Hado
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +
[10] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
Gurumurthy, Swaminathan
Manchester, Zachary
Kolter, J. Zico
[J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211

← 1 2 3 4 5 →