MDPFuzz: Testing Models Solving Markov Decision Processes

被引：10

作者：

Pang, Qi ^{[1
]}

Yuan, Yuanyuan ^{[1
]}

Wang, Shuai ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022 | 2022年

关键词：

Deep learning testing; Markov decision procedure;

D O I：

10.1145/3533767.3534388

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzz, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzz is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzz to significantly enhance their robustness without sacrificing accuracy.

引用

页码：378 / 390

页数：13

共 50 条

[21] An evolutionary random policy search algorithm for solving Markov decision processes
Hu, Jiaqiao
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
[J]. INFORMS JOURNAL ON COMPUTING, 2007, 19 (02) : 161 - 174
[22] Exact and approximate solving of Markov decision processes using propositional logic
Lesner, Boris
Zanuttini, Bruno
[J]. Revue d'Intelligence Artificielle, 2010, 24 (02) : 131 - 158
[23] A Cooperative Distributed Problem Solving Technique for Large Markov Decision Processes
Mouaddib, Abdel-Illah
Le Gloannec, Simon
[J]. ECAI 2006, PROCEEDINGS, 2006, 141 : 843 - +
[24] AN IMPROVED ALGORITHM FOR SOLVING COMMUNICATING AVERAGE REWARD MARKOV DECISION PROCESSES
Haviv, Moshe
Puterman, Martin L.
[J]. ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 229 - 242
[25] Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization
McGregor, Sean
Buckingham, Hailey
Dietterich, Thomas G.
Houtman, Rachel
Montgomery, Claire
Metoyerl, Ronald
[J]. PROCEEDINGS 2015 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2015, : 53 - 61
[26] Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization
McGregor, Sean
Buckingham, Hailey
Dietterich, Thomas G.
Houtman, Rachel
Montgomery, Claire
Metoyer, Ronald
[J]. PROCEEDINGS 2015 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2015, : 281 - 282
[27] Equivalence classes for optimizing risk models in Markov decision processes
Ohtsubo, Y
Toyonaga, K
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 60 (02) : 239 - 250
[28] Equivalence classes for optimizing risk models in Markov decision processes
Yoshio Ohtsubo
Kenji Toyonaga
[J]. Mathematical Methods of Operations Research, 2004, 60 : 239 - 250
[29] Optimal policy for minimizing risk models in Markov decision processes
Ohtsubo, Y
Toyonaga, K
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2002, 271 (01) : 66 - 81
[30] Markov Decision Processes
Bäuerle N.
Rieder U.
[J]. Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243

← 1 2 3 4 5 →