Towards Automating Model Explanations with Certified Robustness Guarantees

被引:0
|
作者
Huai, Mengdi [1 ]
Liu, Jinduo [2 ]
Miao, Chenglin [3 ]
Yao, Liuyi [4 ]
Zhang, Aidong [1 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
[2] Beijing Univ Technol, Beijing, Peoples R China
[3] Univ Georgia, Athens, GA 30602 USA
[4] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing model explanations has gained significant popularity recently. In contrast with the traditional feature-level model explanations, concept-based explanations can provide explanations in the form of high-level human concepts. However, existing concept-based explanation methods implicitly follow a two-step procedure that involves human intervention. Specifically, they first need the human to be involved to define (or extract) the high-level concepts, and then manually compute the importance scores of these identified concepts in a post-hoc way. This laborious process requires significant human effort and resource expenditure due to manual work, which hinders their large-scale deployability. In practice, it is challenging to automatically generate the concept-based explanations without human intervention due to the subjectivity of defining the units of concept-based interpretability. In addition, due to its data-driven nature, the interpretability itself is also potentially susceptible to malicious manipulations. Hence, our goal in this paper is to free human from this tedious process, while ensuring that the generated explanations are provably robust to adversarial perturbations. We propose a novel concept-based interpretation method, which can not only automatically provide the prototype-based concept explanations but also provide certified robustness guarantees for the generated prototype-based explanations. We also conduct extensive experiments on real-world datasets to verify the desirable properties of the proposed method.
引用
收藏
页码:6935 / 6943
页数:9
相关论文
共 50 条
  • [31] The Robustness of Counterfactual Explanations Over Time
    Ferrario, Andrea
    Loi, Michele
    [J]. IEEE ACCESS, 2022, 10 : 82736 - 82750
  • [32] Robustness in evolutionary explanations: a positive account
    Paternotte, Cedric
    Grose, Jonathan
    [J]. BIOLOGY & PHILOSOPHY, 2017, 32 (01) : 73 - 96
  • [33] Topological explanations and robustness in biological sciences
    Philippe Huneman
    [J]. Synthese, 2010, 177 : 213 - 245
  • [34] Robustness in evolutionary explanations: a positive account
    Cédric Paternotte
    Jonathan Grose
    [J]. Biology & Philosophy, 2017, 32 : 73 - 96
  • [35] Safety and Robustness for Deep Learning with Provable Guarantees
    Kwiatkowska, Marta
    [J]. 2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 1 - 3
  • [36] Robustness Guarantees for Bayesian Inference with Gaussian Processes
    Cardelli, Luca
    Kwiatkowska, Marta
    Laurenti, Luca
    Patane, Andrea
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7759 - 7768
  • [37] Safety and Robustness for Deep Learning with Provable Guarantees
    Kwiatkowska, Marta
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 2 - 2
  • [38] Robustness Guarantees for Mode Estimation with an Application to Bandits
    Pacchiano, Aldo
    Jiang, Heinrich
    Jordan, Michael, I
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9277 - 9284
  • [39] Statistical Guarantees for the Robustness of Bayesian Neural Networks
    Cardelli, Luca
    Kwiatkowska, Marta
    Laurenti, Luca
    Paoletti, Nicola
    Patane, Andrea
    Wicker, Matthew
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5693 - 5700
  • [40] Robustness Guarantees for Deep Neural Networks on Videos
    Wu, Min
    Kwiatkowska, Marta
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 308 - 317