Provable Defense against Backdoor Policies in Reinforcement Learning

被引:0
|
作者
Bharti, Shubham Kumar [1 ]
Zhang, Xuezhou [2 ]
Singla, Adish [3 ]
Zhu, Xiaojin [1 ]
机构
[1] UW Madison, Madison, WI 53706 USA
[2] Princeton Univ, Princeton, NJ USA
[3] MPI SWS, Saarbrucken, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves. approximate optimality in the presence of triggers, provided the number of clean interactions is O(D/(1-gamma)(4)c(2)) where gamma is the discounting factor and D is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments. (1)
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Defense against backdoor attack in federated learning
    Lu, Shiwei
    Li, Ruihu
    Liu, Wenbin
    Chen, Xuan
    [J]. COMPUTERS & SECURITY, 2022, 121
  • [2] RAB: Provable Robustness Against Backdoor Attacks
    Weber, Maurice
    Xu, Xiaojun
    Karlas, Bojan
    Zhang, Ce
    Li, Bo
    [J]. 2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1311 - 1328
  • [3] BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
    Wang, Lun
    Javed, Zaynah
    Wu, Xian
    Guo, Wenbo
    Xing, Xinyu
    Song, Dawn
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3699 - 3705
  • [4] GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning
    Gan, Xiaoyun
    Gan, Shanyu
    Su, Taizhi
    Liu, Peng
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 606 - 612
  • [5] BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning
    Cui, Jing
    Han, Yufei
    Ma, Yuzhe
    Jiao, Jianbin
    Zhang, Junge
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11687 - 11694
  • [6] FederatedReverse: A Detection and Defense Method Against Backdoor Attacks in Federated Learning
    Zhao, Chen
    Wen, Yu
    Li, Shuailou
    Liu, Fucheng
    Meng, Dan
    [J]. PROCEEDINGS OF THE 2021 ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY, IH&MMSEC 2021, 2021, : 51 - 62
  • [7] Guardian: Guarding against Gradient Leakage with Provable Defense for Federated Learning
    Fan, Mingyuan
    Liu, Yang
    Chen, Cen
    Wang, Chengyu
    Qiu, Minghui
    Zhou, Wenmeng
    [J]. PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 190 - 198
  • [8] DLP: towards active defense against backdoor attacks with decoupled learning process
    Zonghao Ying
    Bin Wu
    [J]. Cybersecurity, 6
  • [9] FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning
    Jia, Jinyuan
    Yuan, Zhuowen
    Sahabandu, Dinuka
    Niu, Luyao
    Rajabi, Arezoo
    Ramasubramanian, Bhaskar
    Li, Bo
    Poovendran, Radha
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Edge-Cloud Collaborative Defense against Backdoor Attacks in Federated Learning
    Yang, Jie
    Zheng, Jun
    Wang, Haochen
    Li, Jiaxing
    Sun, Haipeng
    Han, Weifeng
    Jiang, Nan
    Tan, Yu-An
    [J]. SENSORS, 2023, 23 (03)