MDPFuzz: Testing Models Solving Markov Decision Processes

被引：10

作者：

Pang, Qi ^{[1
]}

Yuan, Yuanyuan ^{[1
]}

Wang, Shuai ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022 | 2022年

关键词：

Deep learning testing; Markov decision procedure;

D O I：

10.1145/3533767.3534388

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzz, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzz is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzz to significantly enhance their robustness without sacrificing accuracy.

引用

页码：378 / 390

页数：13

共 50 条

[1] Decomposition methods for solving Markov decision processes with multiple models of the parameters
Steimle, Lauren N.
Ahluwalia, Vinayak S.
Kamdar, Charmee
Denton, Brian T.
[J]. IISE TRANSACTIONS, 2021, 53 (12) : 1295 - 1310
[2] Solving concurrent Markov decision processes
Weld, M
Weld, DS
[J]. PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 716 - 722
[3] Solving hybrid Markov decision processes
Reyes, Alberto
Sucar, L. Enrique
Morales, Eduardo F.
Ibarguengoytia, Pablo H.
[J]. MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 227 - +
[4] Efficient Model Solving for Markov Decision Processes
Sapio, Adrian
Bhattacharyya, Shuvra S.
Wolf, Marilyn
[J]. 2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, : 607 - 611
[5] Ordinal Decision Models for Markov Decision Processes
Weng, Paul
[J]. 20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 828 - 833
[6] Solving Markov Decision Processes with Downside Risk Adjustment
Abhijit Gosavi
Anish Parulekar
[J]. Machine Intelligence Research, 2016, 13 (03) : 235 - 245
[7] Solving Markov decision processes with downside risk adjustment
Gosavi A.
Parulekar A.
[J]. International Journal of Automation and Computing, 2016, 13 (3) : 235 - 245
[8] Solving transition independent decentralized Markov decision processes
Becker, R
Zilberstein, S
Lesser, V
Goldman, CV
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 423 - 455
[9] Solving Markov Decision Processes with Partial State Abstractions
Nashed, Samer B.
Svegliato, Justin
Brucato, Matteo
Basich, Connor
Grupen, Rod
Zilberstein, Shlomo
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 813 - 819
[10] Evolutionary policy iteration for solving Markov decision processes
Chang, HS
Lee, HG
Fu, MC
Marcus, SI
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808

← 1 2 3 4 5 →