Probabilistic Policy Reuse for Safe Reinforcement Learning

被引:6
|
作者
Garcia, Javier [1 ]
Fernandez, Fernando [1 ]
机构
[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain
关键词
Reinforcement learning; case-based reasoning; software agents;
D O I
10.1145/3310090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
    Liu, Jinmei
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
  • [22] Control of an Acrobot system using reinforcement learning with probabilistic policy search
    Snehal, N.
    Pooja, W.
    Sonam, K.
    Wagh, S. R.
    Singh, N. M.
    2021 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2021, : 68 - 73
  • [23] Probabilistic Policy Blending for Shared Autonomy using Deep Reinforcement Learning
    Singh, Saurav
    Heard, Jamison
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1537 - 1544
  • [24] Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control
    Du, Bin
    Xie, Wei
    Li, Yang
    Yang, Qisong
    Zhang, Weidong
    Negenborn, Rudy R.
    Pang, Yusong
    Chen, Hongtian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1939 - 1946
  • [25] Safe Off-policy Reinforcement Learning Using Barrier Functions
    Marvi, Zahra
    Kiumarsi, Bahare
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 2176 - 2181
  • [26] Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control
    Du, Bin
    Xie, Wei
    Li, Yang
    Yang, Qisong
    Zhang, Weidong
    Negenborn, Rudy R.
    Pang, Yusong
    Chen, Hongtian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1939 - 1946
  • [27] Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
    Yao, Yihang
    Liu, Zuxin
    Cen, Zhepeng
    Zhu, Jiacheng
    Yu, Wenhao
    Zhang, Tingnan
    Zhao, Ding
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] On Normative Reinforcement Learning via Safe Reinforcement Learning
    Neufeld, Emery A.
    Bartocci, Ezio
    Ciabattoni, Agata
    PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 72 - 89
  • [29] Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking
    Belzner, Lenz
    Wirsing, Martin
    SCIENCE OF COMPUTER PROGRAMMING, 2021, 206
  • [30] Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
    Hachiya, Hirotaka
    Peters, Jan
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2011, 23 (11) : 2798 - 2832