Poisoning finite-horizon Markov decision processes at design time

被引:3
|
作者
Caballero, William N. [1 ]
Jenkins, Phillip R. [1 ]
Keith, Andrew J. [2 ]
机构
[1] Air Force Inst Technol, Dept Operat Sci, 2950 Hobson Way, Wright Patterson AFB, OH 45433 USA
[2] Air Force Studies Anal & Assessments, 1690 Air Force Pentagon, Washington, DC 20330 USA
关键词
Markov decision process; Adversarial learning; Data poisoning; Machine learning; Reinforcement learning; ALGORITHMS;
D O I
10.1016/j.cor.2020.105185
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The contemporary decision making environment is becoming increasingly more automated. Developments in artificial intelligence, machine learning, and operations research have increased the prevalence of computer systems in decision making tasks across a myriad of applications. Markov decision processes (MDPs) are utilized in a variety of system controllers, and attacks against them are of particular interest, even though this problem structure is relatively understudied in the adversarial learning literature. Therefore, in this research, we consider the finite-horizon MDP poisoning problem wherein an adversary perturbs a decision maker's baseline MDP formulation to induce desired behavior while balancing the risk of attack detection. We formally define the associated mathematical programming formulation as a mixed-integer bilevel programming problem. We provide a single-level representation that can be handled by some commercial global solvers, but, since their performance is frequently inadequate, we develop gradient-based, gradient-free, and bifurcation heuristic solution methodologies that include self-tuning extensions. The performance of these algorithms is explored on a wide array of sample problem instances to determine their relative efficacy in terms of solution quality and computational effort for different finite-horizon MDP structures. Published by Elsevier Ltd.
引用
下载
收藏
页数:17
相关论文
共 50 条