Backdoor attack detection via prediction trustworthiness assessment

被引：0

作者：

Zhong, Nan ^{[1
]}

Qian, Zhenxing ^{[1
]}

Zhang, Xinpeng ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 662卷

基金：

中国国家自然科学基金;

关键词：

Backdoor attack; AI security; Trustworthy AI; Backdoor defence; Machine learning;

D O I：

10.1016/j.ins.2024.120283

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Backdoor attack aims to compromise clean models without arousing suspicion, in which poisoned models behave normally for clean inputs yet return adversary-desired results when triggers appear. Due to the great insidiousness and hazard of backdoor attacks, backdoor defences have been attracting a lot of attention in the machine learning security community. Apart from most backdoor mitigation defences, our defence aims to determine whether the prediction of the classifier is trustworthy. More specifically, we scrutinize whether the prediction result is determined by the adversary-defined trigger or the semantic information of an input. To accomplish this goal, we devise a novel algorithm named feature aggregation, which requires only benign inputs and aims to separate the feature representation distributions of poisoned inputs from those of benign ones. The feature aggregation minimizes the distance between intra-benign feature representations and maximizes the distance between benign and poisoned feature representations. Then, we employ flow-based probability density estimation to model the distribution of benign feature representations. Since the likelihood of poisoned inputs over the estimated distribution is significantly smaller than those of benign ones, they can be identified based on an adaptive threshold. Experimental results show that our method outperforms state-ofthe-art defences.

引用

页数：15

共 50 条

[1] Link-Backdoor: Backdoor Attack on Link Prediction via Node Injection
Zheng, Haibin
Xiong, Haiyang
Ma, Haonan
Huang, Guohan
Chen, Jinyin
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 1816 - 1831
[2] Dyn-Backdoor: Backdoor Attack on Dynamic Link Prediction
Chen, Jinyin
Xiong, Haiyang
Zheng, Haibin
Zhang, Jian
Liu, Yi
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (01): : 525 - 542
[3] Textual Backdoor Attack via Keyword Positioning
Chen, Depeng
Mao, Fangfang
Jin, Hulin
Cui, Jie
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 55 - 66
[4] Conditional Backdoor Attack via JPEG Compression
Duan, Qiuyu
Hua, Zhongyun
Liao, Qing
Zhang, Yushu
Zhang, Leo Yu
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18, 2024, : 19823 - 19831
[5] Camouflage Backdoor Attack against Pedestrian Detection
Wu, Yalun
Gu, Yanfeng
Chen, Yuanwan
Cui, Xiaoshu
Li, Qiong
Xiang, Yingxiao
Tong, Endong
Li, Jianhua
Han, Zhen
Liu, Jiqiang
APPLIED SCIENCES-BASEL, 2023, 13 (23):
[6] Motif-Backdoor: Rethinking the Backdoor Attack on Graph Neural Networks via Motifs
Zheng, Haibin
Xiong, Haiyang
Chen, Jinyin
Ma, Haonan
Huang, Guohan
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02): : 2479 - 2493
[7] Federated learning backdoor attack detection with persistence diagram
Ma, Zihan
Gao, Tianchong
COMPUTERS & SECURITY, 2024, 136
[8] A stealthy and robust backdoor attack via frequency domain transform
Hou, Ruitao
Huang, Teng
Yan, Hongyang
Ke, Lishan
Tang, Weixuan
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2767 - 2783
[9] Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models
Yao, Zeming
Zhang, Hangtao
Guo, Yicheng
Tian, Xin
Peng, Wei
Zou, Yi
Zhang, Leo Yu
Chen, Chao
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (06) : 5098 - 5111
[10] A stealthy and robust backdoor attack via frequency domain transform
Ruitao Hou
Teng Huang
Hongyang Yan
Lishan Ke
Weixuan Tang
World Wide Web, 2023, 26 : 2767 - 2783

← 1 2 3 4 5 →