Probabilistic Policy Reuse for Safe Reinforcement Learning

被引：6

作者：

Garcia, Javier ^{[1
]}

Fernandez, Fernando ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2019年 / 13卷 / 03期

关键词：

Reinforcement learning; case-based reasoning; software agents;

D O I：

10.1145/3310090

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.

引用

页数：24

共 50 条

[31] Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
Wen, Lu
Duan, Jingliang
Li, Shengbo Eben
Xu, Shaobing
Peng, Huei
2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
[32] Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis
Mitta, Rohan
Hasanbeig, Hosein
Wang, Jun
Kroening, Daniel
Kantaros, Yiannis
Abate, Alessandro
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21412 - 21419
[33] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Horie, Naoto
Matsui, Tohgoroh
Moriyama, Koichi
Mutoh, Atsuko
Inuzuka, Nobuhiro
ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
[34] Safe Reinforcement Learning: A Survey
Wang X.-S.
Wang R.-R.
Cheng Y.-H.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1813 - 1835
[35] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Naoto Horie
Tohgoroh Matsui
Koichi Moriyama
Atsuko Mutoh
Nobuhiro Inuzuka
Artificial Life and Robotics, 2019, 24 : 352 - 359
[36] Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces
Krasowski, Hanna
Akella, Prithvi
Ames, Aaron D.
Althoff, Matthias
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4372 - 4378
[37] Quantum error correction for heavy hexagonal code using deep reinforcement learning with policy reuse
Ji, Yuxin
Chen, Qinghui
Wang, Rui
Ji, Naihua
Ma, Hongyang
QUANTUM INFORMATION PROCESSING, 2024, 23 (07)
[38] Off-policy safe reinforcement learning for nonlinear discrete-time systems
Jha, Mayank Shekhar
Kiumarsi, Bahare
NEUROCOMPUTING, 2025, 611
[39] Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning
Ma, Haitong
Liu, Changliu
Li, Shengbo Eben
Zheng, Sifa
Chen, Jianyu
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
[40] Safe Reinforcement Learning-based Driving Policy Design for Autonomous Vehicles on Highways
Nguyen, Hung Duy
Han, Kyoungseok
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (12) : 4098 - 4110

← 1 2 3 4 5 →