Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

被引：0

作者：

Geiger, Atticus ^{[1
]}

Wu, Zhengxuan ^{[1
]}

Potts, Christopher ^{[1
]}

Icard, Thomas ^{[1
]}

Goodman, Noah D. ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA USA

来源：

CAUSAL LEARNING AND REASONING, VOL 236 | 2024年 / 236卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to uncovering conceptual structure in trained neural nets.

引用

页码：160 / 187

页数：28

共 50 条

[1] Inducing Causal Structure for Interpretable Neural Networks
Geiger, Atticus
Wu, Zhengxuan
Lu, Hanson
Rozner, Josh
Kreiss, Elisa
Icard, Thomas
Goodman, Noah D.
Potts, Christopher
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] An algorithm for finding the causal distributed breakpoint
Masuzawa, T
Tokura, N
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 42 (01) : 60 - 66
[3] An Algorithm for Finding the Causal Distributed Breakpoint
Masuzawa, T.
Tokura, N.
[J]. Journal of Parallel and Distributed Computing, 42 (01):
[4] Finding Multiple Variables Causal Dispositions in Images
Tang Sisi
Wan Yaping
[J]. 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2020), 2020, : 368 - 372
[5] Interpretable Neural Predictions with Differentiable Binary Variables
Bastings, Joost
Aziz, Wilker
Titov, Ivan
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2963 - 2977
[6] BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
Gouws, Stephan
Bengio, Yoshua
Corrado, Greg
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 748 - 756
[7] Finding Syntactic Representations in Neural Stacks
Merrill, William
Khazan, Lenny
Amsel, Noah
Hao, Yiding
Mendelsohn, Simon
Frank, Robert
[J]. BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 224 - 232
[8] Identifying Granger causal relationships between neural power dynamics and variables of interest
Winkler, Irene
Haufe, Stefan
Porbadnigk, Anne K.
Mueller, Klaus-Robert
Daehne, Sven
[J]. NEUROIMAGE, 2015, 111 : 489 - 504
[9] Finding Distributed Needles in Neural Haystacks
Cox, Christopher R.
Rogers, Timothy T.
[J]. JOURNAL OF NEUROSCIENCE, 2021, 41 (05): : 1019 - 1032
[10] Finding optimal interaction interface alignments between biological complexes
Cui, Xuefeng
Naveed, Hammad
Gao, Xin
[J]. BIOINFORMATICS, 2015, 31 (12) : 133 - 141

← 1 2 3 4 5 →