Concept whitening for interpretable image recognition

被引：151

作者：

Chen, Zhi ^{[1
]}

Bei, Yijie ^{[2
]}

Rudin, Cynthia ^{[1
,2
]}

机构：

[1] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA

[2] Duke Univ, Dept Elect & Comp Engn, Durham, NC USA

来源：

NATURE MACHINE INTELLIGENCE | 2020年 / 2卷 / 12期

基金：

美国国家科学基金会;

关键词：

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important; but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading; unusable or rely on the latent space to possess properties that it may not have. Here; rather than attempting to analyse a neural network post hoc; we introduce a mechanism; called concept whitening (CW); to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network; the latent space is whitened (that is; decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment; we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes; and also decorrelates (whitens); the latent space. CW can be used in any layer of the network without hurting predictive performance. © 2020; The Author(s); under exclusive licence to Springer Nature Limited;

D O I：

10.1038/s42256-020-00265-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading, unusable or rely on the latent space to possess properties that it may not have. Here, rather than attempting to analyse a neural network post hoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network, the latent space is whitened (that is, decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens), the latent space. CW can be used in any layer of the network without hurting predictive performance. There is much interest in 'explainable' AI, but most efforts concern post hoc methods. Instead, a neural network can be made inherently interpretable, with an approach that involves making human-understandable concepts (aeroplane, bed, lamp and so on) align along the axes of its latent space.

引用

页码：772 / 782

页数：12

共 50 条

[1] Concept whitening for interpretable image recognition
Zhi Chen
Yijie Bei
Cynthia Rudin
Nature Machine Intelligence, 2020, 2 : 772 - 782
[2] Concept-Attention Whitening for Interpretable Skin Lesion Diagnosis
Hou, Junlin
Xu, Jilan
Chen, Hao
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 113 - 123
[3] Interpretable Image Recognition in Hyperbolic Space
Lebedeva, Irina
Bah, Mohamed Jaward
Li, Taihao
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 643 - 650
[4] Transparent Embedding Space for Interpretable Image Recognition
Wang, Jiaqi
Liu, Huafeng
Jing, Liping
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3204 - 3219
[5] Multi-Grained Interpretable Network for Image Recognition
Yang, Peiyu
Wen, Zeyi
Mian, Ajmal
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3815 - 3821
[6] Interpretable Image Recognition by Constructing Transparent Embedding Space
Wang, Jiaqi
Liu, Huafeng
Wang, Xinyue
Jing, Liping
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 875 - 884
[7] This Looks Like That: Deep Learning for Interpretable Image Recognition
Chen, Chaofan
Li, Oscar
Tao, Chaofan
Barnett, Alina Jade
Su, Jonathan
Rudin, Cynthia
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] Think positive: An interpretable neural network for image recognition
Singh, Gurmail
NEURAL NETWORKS, 2022, 151 : 178 - 189
[9] ADIC: An Adaptive Disentangled CNN Classifier for Interpretable Image Recognition
Zhao X.
Li Z.
Wang W.
Xu X.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (08): : 1754 - 1767
[10] This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition
Nauta, Meike
Jutte, Annemarie
Provoost, Jesper
Seifert, Christin
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021, PT I, 2021, 1524 : 441 - 456

← 1 2 3 4 5 →