MDPFuzz: Testing Models Solving Markov Decision Processes

被引:10
|
作者
Pang, Qi [1 ]
Yuan, Yuanyuan [1 ]
Wang, Shuai [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
Deep learning testing; Markov decision procedure;
D O I
10.1145/3533767.3534388
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzz, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzz is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzz to significantly enhance their robustness without sacrificing accuracy.
引用
收藏
页码:378 / 390
页数:13
相关论文
共 50 条
  • [21] An evolutionary random policy search algorithm for solving Markov decision processes
    Hu, Jiaqiao
    Fu, Michael C.
    Ramezani, Vahid R.
    Marcus, Steven I.
    [J]. INFORMS JOURNAL ON COMPUTING, 2007, 19 (02) : 161 - 174
  • [22] Exact and approximate solving of Markov decision processes using propositional logic
    Lesner, Boris
    Zanuttini, Bruno
    [J]. Revue d'Intelligence Artificielle, 2010, 24 (02) : 131 - 158
  • [23] A Cooperative Distributed Problem Solving Technique for Large Markov Decision Processes
    Mouaddib, Abdel-Illah
    Le Gloannec, Simon
    [J]. ECAI 2006, PROCEEDINGS, 2006, 141 : 843 - +
  • [24] AN IMPROVED ALGORITHM FOR SOLVING COMMUNICATING AVERAGE REWARD MARKOV DECISION PROCESSES
    Haviv, Moshe
    Puterman, Martin L.
    [J]. ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 229 - 242
  • [25] Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization
    McGregor, Sean
    Buckingham, Hailey
    Dietterich, Thomas G.
    Houtman, Rachel
    Montgomery, Claire
    Metoyerl, Ronald
    [J]. PROCEEDINGS 2015 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2015, : 53 - 61
  • [26] Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization
    McGregor, Sean
    Buckingham, Hailey
    Dietterich, Thomas G.
    Houtman, Rachel
    Montgomery, Claire
    Metoyer, Ronald
    [J]. PROCEEDINGS 2015 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2015, : 281 - 282
  • [27] Equivalence classes for optimizing risk models in Markov decision processes
    Ohtsubo, Y
    Toyonaga, K
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 60 (02) : 239 - 250
  • [28] Equivalence classes for optimizing risk models in Markov decision processes
    Yoshio Ohtsubo
    Kenji Toyonaga
    [J]. Mathematical Methods of Operations Research, 2004, 60 : 239 - 250
  • [29] Optimal policy for minimizing risk models in Markov decision processes
    Ohtsubo, Y
    Toyonaga, K
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2002, 271 (01) : 66 - 81
  • [30] Markov Decision Processes
    Bäuerle N.
    Rieder U.
    [J]. Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243