Probabilistic Policy Reuse for Safe Reinforcement Learning

被引：6

作者：

Garcia, Javier ^{[1
]}

Fernandez, Fernando ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2019年 / 13卷 / 03期

关键词：

Reinforcement learning; case-based reasoning; software agents;

D O I：

10.1145/3310090

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.

引用

页数：24

共 50 条

[41] Safe Reinforcement Learning-based Driving Policy Design for Autonomous Vehicles on Highways
Hung Duy Nguyen
Kyoungseok Han
International Journal of Control, Automation and Systems, 2023, 21 : 4098 - 4110
[42] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
Lu, Songtao
Zhang, Kaiqing
Chen, Tianyi
Basar, Tamer
Horesh, Lior
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
[43] What Is Acceptably Safe for Reinforcement Learning?
Bragg, John
Habli, Ibrahim
COMPUTER SAFETY, RELIABILITY, AND SECURITY, SAFECOMP 2018, 2018, 11094 : 418 - 430
[44] A comprehensive survey on safe reinforcement learning
García, Javier
Fernández, Fernando
Journal of Machine Learning Research, 2015, 16 : 1437 - 1480
[45] Safe Reinforcement Learning for Sepsis Treatment
Jia, Yan
Burden, John
Lawton, Tom
Habli, Ibrahim
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 108 - 114
[46] Safe Reinforcement Learning With Dual Robustness
Li, Zeyang
Hu, Chuxiong
Wang, Yunan
Yang, Yujie
Li, Shengbo Eben
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10876 - 10890
[47] Lyapunov design for safe reinforcement learning
Perkins, TJ
Barto, AG
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 803 - 832
[48] Safe reinforcement learning for dynamical games
Yang, Yongliang
Vamvoudakis, Kyriakos G.
Modares, Hamidreza
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2020, 30 (09) : 3706 - 3726
[49] Safe Reinforcement Learning via Shielding
Alshiekh, Mohammed
Bloem, Roderick
Ehlers, Ruediger
Koenighofer, Bettina
Niekum, Scott
Topcu, Ufuk
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2669 - 2678
[50] Safe Reinforcement Learning for Legged Locomotion
Yang, Tsung-Yen
Zhang, Tingnan
Luu, Linda
Ha, Sehoon
Tan, Jie
Yu, Wenhao
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2454 - 2461

← 1 2 3 4 5 →