Probabilistic Policy Reuse for Safe Reinforcement Learning

被引：6

作者：

Garcia, Javier ^{[1
]}

Fernandez, Fernando ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2019年 / 13卷 / 03期

关键词：

Reinforcement learning; case-based reasoning; software agents;

D O I：

10.1145/3310090

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.

引用

页数：24

共 50 条

[21] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
Liu, Jinmei
Wang, Zhi
Chen, Chunlin
Dong, Daoyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
[22] Control of an Acrobot system using reinforcement learning with probabilistic policy search
Snehal, N.
Pooja, W.
Sonam, K.
Wagh, S. R.
Singh, N. M.
2021 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2021, : 68 - 73
[23] Probabilistic Policy Blending for Shared Autonomy using Deep Reinforcement Learning
Singh, Saurav
Heard, Jamison
2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1537 - 1544
[24] Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control
Du, Bin
Xie, Wei
Li, Yang
Yang, Qisong
Zhang, Weidong
Negenborn, Rudy R.
Pang, Yusong
Chen, Hongtian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1939 - 1946
[25] Safe Off-policy Reinforcement Learning Using Barrier Functions
Marvi, Zahra
Kiumarsi, Bahare
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 2176 - 2181
[26] Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control
Du, Bin
Xie, Wei
Li, Yang
Yang, Qisong
Zhang, Weidong
Negenborn, Rudy R.
Pang, Yusong
Chen, Hongtian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1939 - 1946
[27] Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
Yao, Yihang
Liu, Zuxin
Cen, Zhepeng
Zhu, Jiacheng
Yu, Wenhao
Zhang, Tingnan
Zhao, Ding
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[28] On Normative Reinforcement Learning via Safe Reinforcement Learning
Neufeld, Emery A.
Bartocci, Ezio
Ciabattoni, Agata
PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 72 - 89
[29] Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking
Belzner, Lenz
Wirsing, Martin
SCIENCE OF COMPUTER PROGRAMMING, 2021, 206
[30] Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Hachiya, Hirotaka
Peters, Jan
Sugiyama, Masashi
NEURAL COMPUTATION, 2011, 23 (11) : 2798 - 2832

← 1 2 3 4 5 →