Transferability of features for neural networks links to adversarial attacks and defences

被引:2
|
作者
Kotyan, Shashank [1 ]
Matsuki, Moe [2 ]
Vargas, Danilo Vasconcellos [1 ,3 ]
机构
[1] Kyushu Univ, Dept Informat Sci & Engn, Fukuoka, Japan
[2] SoftBank Grp Corp, Tokyo, Japan
[3] Univ Tokyo, Sch Engn, Dept Elect Engn & Informat Syst, Tokyo, Japan
来源
PLOS ONE | 2022年 / 17卷 / 04期
关键词
D O I
10.1371/journal.pone.0266060
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks' capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, "if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes". This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Robustness and Transferability of Adversarial Attacks on Different Image Classification Neural Networks
    Smagulova, Kamilya
    Bacha, Lina
    Fouda, Mohammed E.
    Kanj, Rouwaida
    Eltawil, Ahmed
    [J]. ELECTRONICS, 2024, 13 (03)
  • [2] Demystifying the Transferability of Adversarial Attacks in Computer Networks
    Nowroozi, Ehsan
    Mekdad, Yassine
    Berenjestanaki, Mohammad Hajian
    Conti, Mauro
    El Fergougui, Abdeslam
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (03): : 3387 - 3400
  • [3] On the Transferability of Adversarial Attacks against Neural Text Classifier
    Yuan, Liping
    Zheng, Xiaoqing
    Zhou, Yi
    Hsieh, Cho-Jui
    Chang, Kai-Wei
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1612 - 1625
  • [4] A survey on adversarial attacks and defences
    Chakraborty, Anirban
    Alam, Manaar
    Dey, Vishal
    Chattopadhyay, Anupam
    Mukhopadhyay, Debdeep
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2021, 6 (01) : 25 - 45
  • [5] Disrupting adversarial transferability in deep neural networks
    Wiedeman, Christopher
    Wang, Ge
    [J]. PATTERNS, 2022, 3 (05):
  • [6] Exploring Transferability on Adversarial Attacks
    Alvarez, Enrique
    Alvarez, Rafael
    Cazorla, Miguel
    [J]. IEEE ACCESS, 2023, 11 : 105545 - 105556
  • [7] Unscrambling the Rectification of Adversarial Attacks Transferability across Computer Networks
    Nowroozi, Ehsan
    Ghelichkhani, Samaneh
    Haider, Imran
    Dehghantanha, Ali
    [J]. arXiv, 2023,
  • [8] Properties that allow or prohibit transferability of adversarial attacks among quantized networks
    Shrestha, Abhishek
    Grossmann, Juergen
    [J]. PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 99 - 109
  • [9] Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation
    Qin, Zeyu
    Fan, Yanbo
    Liu, Yi
    Shen, Li
    Zhang, Yong
    Wang, Jue
    Wu, Baoyuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Admix: Enhancing the Transferability of Adversarial Attacks
    Wang, Xiaosen
    He, Xuanran
    Wang, Jingdong
    He, Kun
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16138 - 16147