Probabilistic Policy Reuse for Safe Reinforcement Learning

被引：6

作者：

Garcia, Javier ^{[1
]}

Fernandez, Fernando ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2019年 / 13卷 / 03期

关键词：

Reinforcement learning; case-based reasoning; software agents;

D O I：

10.1145/3310090

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.

引用

页数：24

共 50 条

[1] Learning domain structure through probabilistic policy reuse in reinforcement learning
Fernandez, Fernando
Veloso, Manuela
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2013, 2 (01) : 13 - 27
[2] Survey on policy reuse in reinforcement learning
He L.
Shen L.
Li H.
Wang Z.
Tang W.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2022, 44 (03): : 884 - 899
[3] Policy Reuse in Deep Reinforcement Learning
Glatt, Ruben
Helena, Anna
Costa, Reali
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4929 - 4930
[4] Probabilistic Guarantees for Safe Deep Reinforcement Learning
Bacci, Edoardo
Parker, David
FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 : 231 - 248
[5] Policy Reuse in Reinforcement Learning for Modular Agents
Raza, Sayyed Jaffar Ali
Lin, Mingjie
2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 165 - 169
[6] Safe Reinforcement Learning via Probabilistic Logic Shields
Yang, Wen-Chi
Marra, Giuseppe
Rens, Gavin
De Raedt, Luc
NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
[7] Safe Reinforcement Learning via Probabilistic Logic Shields
Yang, Wen-Chi
Marra, Giuseppe
Rens, Gavin
De Raedt, Luc
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5739 - 5749
[8] On the Reuse Bias in Off-Policy Reinforcement Learning
Ying, Chengyang
Hao, Zhongkai
Zhou, Xinning
Su, Hang
Yan, Dong
Zhu, Jun
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4513 - 4521
[9] Reinforcement Learning Experience Reuse with Policy Residual Representation
Zhou, WenJi
Yu, Yang
Chen, Yingfeng
Guan, Kai
Lv, Tangjie
Fan, Changjie
Zhi-Hua Zhou
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4447 - 4453
[10] Convergent Policy Optimization for Safe Reinforcement Learning
Yu, Ming
Yang, Zhuoran
Kolar, Mladen
Wang, Zhaoran
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →