Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving

被引：0

作者：

Wang J. ^{[1
,2
]}

Huang Z. ^{[1
,2
]}

Yang D. ^{[3
]}

Huang X. ^{[4
]}

Zhu Y. ^{[3
]}

Hua G. ^{[1
,2
]}

机构：

[1] College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing

[2] Key Laboratory of Safety-Critical Software (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing

[3] School of Computer Science and Technology, Jiangsu Normal University, Xuzhou

[4] Department of Computer Science, University of Liverpool, Liverpool

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2021年 / 58卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Autonomous driving safety; Formal specification; Intelligent traffic simulation; Safe reinforcement learning; Spatio-clock synchronous constraint; Temporal difference;

D O I：

10.7544/issn1000-1239.2021.20211023

中图分类号：

学科分类号：

摘要：

Autonomous driving systems integrate complex interactions between hardware and software. In order to ensure the safe and reliable operations, formal methods are used to provide rigorous guarantees to satisfy logical specifications and safety-critical requirements in the design stage. As a widely employed machine learning architecture, deep reinforcement learning (DRL) focuses on finding an optimal policy that maximizes a cumulative discounted reward by interacting with the environment, and has been applied to autonomous driving decision-making modules. However, black-box DRL-based autonomous driving systems cannot provide guarantees of safe operation and reward definition interpretability techniques for complex tasks, especially when they face unfamiliar situations and reason about a greater number of options. In order to address these problems, spatio-clock synchronous constraint is adopted to augment DRL safety and interpretability. Firstly, we propose a dedicated formal properties specification language for autonomous driving domain, i.e., spatio-clock synchronous constraint specification language, and present domain-specific knowledge requirements specification that is close to natural language to make the reward functions generation process more interpretable. Secondly, we present domain-specific spatio-clock synchronous automata to describe spatio-clock autonomous behaviors, i.e., controllers related to certain spatio- and clock-critical actions, and present safe state-action space transition systems to guarantee the safety of DRL optimal policy generation process. Thirdly, based on the formal specification and policy learning, we propose a formal spatio-clock synchronous constraint guided safe reinforcement learning method with the goal of easily understanding the safe reward function. Finally, we demonstrate the effectiveness of our proposed approach through an autonomous lane changing and overtaking case study in the highway scenario. © 2021, Science Press. All right reserved.

引用

页码：2585 / 2603

页数：18

共 49 条

[1] Mirchevska B, Pek C, Werling M, Et al., High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning, Proc of the 21st Int Conf on Intelligent Transportation Systems (ITSC), pp. 2156-2162, (2018)
[2] Zhou Zhijie, Cao You, Hu Changhua, Et al., The interpretability of rule-based modeling approach and its development, Acta Automatica Sinica, 47, 6, pp. 1201-1216, (2021)
[3] Du Jin, Zheng Qinghua, Li Haifei, Et al., The research of mining association rules between personality and behavior of learner under web-based learning environment, Proc of the Int Conf on Web-Based Learning, pp. 406-417, (2005)
[4] Zheng Qinghua, Liu Jun, Zeng Hongwei, Et al., Knowledge forest: A novel model to organize knowledge fragments, (2019)
[5] Fulton N, Platzer A., Safe reinforcement learning via formal methods: Toward safe control through proof and learning, Proc of the AAAI Conf on Artificial Intelligence, 32, 1, (2018)
[6] Likmeta A, Metelli A M, Tirinzoni A, Et al., Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving, Robotics and Autonomous Systems, 131, (2020)
[7] Hammond L, Abate A, Gutierrez J, Et al., Multi-agent reinforcement learning with temporal logic specifications, Proc of the 20th Int Conf on Autonomous Agents and MultiAgent Systems, pp. 583-592, (2021)
[8] Li Xiao, Serlin Z, Yang Guang, Et al., A formal methods approach to interpretable reinforcement learning for robotic planning, Science Robotics, 4, 37, (2019)
[9] ToroIcarte R, Klassen T Q, Valenzano R, Et al., Teaching multiple tasks to an RL agent using LTL, Proc of the 17th Int Conf on Autonomous Agents and MultiAgent Systems, pp. 452-461, (2018)
[10] Icarte R T, Klassen T, Valenzano R, Et al., Using reward machines for high-level task specification and decomposition in reinforcement learning, Proc of the Int Conf on Machine Learning, pp. 2107-2116, (2018)

← 1 2 3 4 5 →