Backdoor attack detection via prediction trustworthiness assessment

被引:0
|
作者
Zhong, Nan [1 ]
Qian, Zhenxing [1 ]
Zhang, Xinpeng [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Backdoor attack; AI security; Trustworthy AI; Backdoor defence; Machine learning;
D O I
10.1016/j.ins.2024.120283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Backdoor attack aims to compromise clean models without arousing suspicion, in which poisoned models behave normally for clean inputs yet return adversary-desired results when triggers appear. Due to the great insidiousness and hazard of backdoor attacks, backdoor defences have been attracting a lot of attention in the machine learning security community. Apart from most backdoor mitigation defences, our defence aims to determine whether the prediction of the classifier is trustworthy. More specifically, we scrutinize whether the prediction result is determined by the adversary-defined trigger or the semantic information of an input. To accomplish this goal, we devise a novel algorithm named feature aggregation, which requires only benign inputs and aims to separate the feature representation distributions of poisoned inputs from those of benign ones. The feature aggregation minimizes the distance between intra-benign feature representations and maximizes the distance between benign and poisoned feature representations. Then, we employ flow-based probability density estimation to model the distribution of benign feature representations. Since the likelihood of poisoned inputs over the estimated distribution is significantly smaller than those of benign ones, they can be identified based on an adaptive threshold. Experimental results show that our method outperforms state-ofthe-art defences.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Link-Backdoor: Backdoor Attack on Link Prediction via Node Injection
    Zheng, Haibin
    Xiong, Haiyang
    Ma, Haonan
    Huang, Guohan
    Chen, Jinyin
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 1816 - 1831
  • [2] Dyn-Backdoor: Backdoor Attack on Dynamic Link Prediction
    Chen, Jinyin
    Xiong, Haiyang
    Zheng, Haibin
    Zhang, Jian
    Liu, Yi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (01): : 525 - 542
  • [3] Textual Backdoor Attack via Keyword Positioning
    Chen, Depeng
    Mao, Fangfang
    Jin, Hulin
    Cui, Jie
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 55 - 66
  • [4] Conditional Backdoor Attack via JPEG Compression
    Duan, Qiuyu
    Hua, Zhongyun
    Liao, Qing
    Zhang, Yushu
    Zhang, Leo Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18, 2024, : 19823 - 19831
  • [5] Camouflage Backdoor Attack against Pedestrian Detection
    Wu, Yalun
    Gu, Yanfeng
    Chen, Yuanwan
    Cui, Xiaoshu
    Li, Qiong
    Xiang, Yingxiao
    Tong, Endong
    Li, Jianhua
    Han, Zhen
    Liu, Jiqiang
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [6] Motif-Backdoor: Rethinking the Backdoor Attack on Graph Neural Networks via Motifs
    Zheng, Haibin
    Xiong, Haiyang
    Chen, Jinyin
    Ma, Haonan
    Huang, Guohan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02): : 2479 - 2493
  • [7] Federated learning backdoor attack detection with persistence diagram
    Ma, Zihan
    Gao, Tianchong
    COMPUTERS & SECURITY, 2024, 136
  • [8] A stealthy and robust backdoor attack via frequency domain transform
    Hou, Ruitao
    Huang, Teng
    Yan, Hongyang
    Ke, Lishan
    Tang, Weixuan
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2767 - 2783
  • [9] Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models
    Yao, Zeming
    Zhang, Hangtao
    Guo, Yicheng
    Tian, Xin
    Peng, Wei
    Zou, Yi
    Zhang, Leo Yu
    Chen, Chao
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (06) : 5098 - 5111
  • [10] A stealthy and robust backdoor attack via frequency domain transform
    Ruitao Hou
    Teng Huang
    Hongyang Yan
    Lishan Ke
    Weixuan Tang
    World Wide Web, 2023, 26 : 2767 - 2783