Towards Automating Model Explanations with Certified Robustness Guarantees

被引:0
|
作者
Huai, Mengdi [1 ]
Liu, Jinduo [2 ]
Miao, Chenglin [3 ]
Yao, Liuyi [4 ]
Zhang, Aidong [1 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
[2] Beijing Univ Technol, Beijing, Peoples R China
[3] Univ Georgia, Athens, GA 30602 USA
[4] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing model explanations has gained significant popularity recently. In contrast with the traditional feature-level model explanations, concept-based explanations can provide explanations in the form of high-level human concepts. However, existing concept-based explanation methods implicitly follow a two-step procedure that involves human intervention. Specifically, they first need the human to be involved to define (or extract) the high-level concepts, and then manually compute the importance scores of these identified concepts in a post-hoc way. This laborious process requires significant human effort and resource expenditure due to manual work, which hinders their large-scale deployability. In practice, it is challenging to automatically generate the concept-based explanations without human intervention due to the subjectivity of defining the units of concept-based interpretability. In addition, due to its data-driven nature, the interpretability itself is also potentially susceptible to malicious manipulations. Hence, our goal in this paper is to free human from this tedious process, while ensuring that the generated explanations are provably robust to adversarial perturbations. We propose a novel concept-based interpretation method, which can not only automatically provide the prototype-based concept explanations but also provide certified robustness guarantees for the generated prototype-based explanations. We also conduct extensive experiments on real-world datasets to verify the desirable properties of the proposed method.
引用
收藏
页码:6935 / 6943
页数:9
相关论文
共 50 条
  • [1] PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees
    Zhang, Jinghuai
    Jia, Jinyuan
    Liu, Hongbin
    Gong, Neil Zhenqiang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9496 - 9505
  • [2] Towards robustness guarantees for feedback-based optimization
    Colombino, Marcello
    Simpson-Porco, John W.
    Bernstein, Andrey
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 6207 - 6214
  • [3] Towards Fast Computation of Certified Robustness for ReLU Networks
    Weng, Tsui-Wei
    Zhang, Huan
    Chen, Hongge
    Song, Zhao
    Hsieh, Cho-Jui
    Boning, Duane
    Dhillon, Inderjit S.
    Daniel, Luca
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [4] Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
    Agarwal, Sushant
    Jabbari, Shahin
    Agarwal, Chirag
    Upadhyay, Sohini
    Wu, Zhiwei Steven
    Lakkaraju, Himabindu
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Robustness Guarantees for Structured Model Reduction of Dynamical Systems
    Pandey, Ayush
    Murray, Richard M.
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 6920 - 6927
  • [6] Robustness Guarantees for Anonymity
    Barthe, Gilles
    Hevia, Alejandro
    Luo, Zhengqin
    Rezk, Tamara
    Warinschi, Bogdan
    [J]. 2010 23RD IEEE COMPUTER SECURITY FOUNDATIONS SYMPOSIUM (CSF), 2010, : 91 - 106
  • [7] Tiny RNN Model with Certified Robustness for Text Classification
    Qiang, Yao
    Kumar, Supriya Tumkur Suresh
    Brocanelli, Marco
    Zhu, Dongxiao
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [8] Robustness guarantees for linear control designs with an estimated nonlinear model error model
    Glad, ST
    Helmersson, A
    Ljung, L
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2004, 14 (11) : 959 - 970
  • [9] Scaling Guarantees for Nearest Counterfactual Explanations
    Mohammadi, Kiarash
    Karimi, Amir-Hossein
    Barthe, Gilles
    Valera, Isabel
    [J]. AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, : 177 - 187
  • [10] Robustness Guarantees for Density Clustering
    Jiang, Heinrich
    Jang, Jennifer
    Nachum, Ofir
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89