Probabilistic Policy Reuse for Safe Reinforcement Learning

被引:6
|
作者
Garcia, Javier [1 ]
Fernandez, Fernando [1 ]
机构
[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain
关键词
Reinforcement learning; case-based reasoning; software agents;
D O I
10.1145/3310090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Learning domain structure through probabilistic policy reuse in reinforcement learning
    Fernandez, Fernando
    Veloso, Manuela
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2013, 2 (01) : 13 - 27
  • [2] Survey on policy reuse in reinforcement learning
    He L.
    Shen L.
    Li H.
    Wang Z.
    Tang W.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2022, 44 (03): : 884 - 899
  • [3] Policy Reuse in Deep Reinforcement Learning
    Glatt, Ruben
    Helena, Anna
    Costa, Reali
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4929 - 4930
  • [4] Probabilistic Guarantees for Safe Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 : 231 - 248
  • [5] Policy Reuse in Reinforcement Learning for Modular Agents
    Raza, Sayyed Jaffar Ali
    Lin, Mingjie
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 165 - 169
  • [6] Safe Reinforcement Learning via Probabilistic Logic Shields
    Yang, Wen-Chi
    Marra, Giuseppe
    Rens, Gavin
    De Raedt, Luc
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
  • [7] Safe Reinforcement Learning via Probabilistic Logic Shields
    Yang, Wen-Chi
    Marra, Giuseppe
    Rens, Gavin
    De Raedt, Luc
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5739 - 5749
  • [8] On the Reuse Bias in Off-Policy Reinforcement Learning
    Ying, Chengyang
    Hao, Zhongkai
    Zhou, Xinning
    Su, Hang
    Yan, Dong
    Zhu, Jun
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4513 - 4521
  • [9] Reinforcement Learning Experience Reuse with Policy Residual Representation
    Zhou, WenJi
    Yu, Yang
    Chen, Yingfeng
    Guan, Kai
    Lv, Tangjie
    Fan, Changjie
    Zhi-Hua Zhou
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4447 - 4453
  • [10] Convergent Policy Optimization for Safe Reinforcement Learning
    Yu, Ming
    Yang, Zhuoran
    Kolar, Mladen
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32