Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

被引:0
|
作者
Geiger, Atticus [1 ]
Wu, Zhengxuan [1 ]
Potts, Christopher [1 ]
Icard, Thomas [1 ]
Goodman, Noah D. [1 ]
机构
[1] Stanford Univ, Stanford, CA USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to uncovering conceptual structure in trained neural nets.
引用
收藏
页码:160 / 187
页数:28
相关论文
共 50 条
  • [1] Inducing Causal Structure for Interpretable Neural Networks
    Geiger, Atticus
    Wu, Zhengxuan
    Lu, Hanson
    Rozner, Josh
    Kreiss, Elisa
    Icard, Thomas
    Goodman, Noah D.
    Potts, Christopher
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] An algorithm for finding the causal distributed breakpoint
    Masuzawa, T
    Tokura, N
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 42 (01) : 60 - 66
  • [3] An Algorithm for Finding the Causal Distributed Breakpoint
    Masuzawa, T.
    Tokura, N.
    [J]. Journal of Parallel and Distributed Computing, 42 (01):
  • [4] Finding Multiple Variables Causal Dispositions in Images
    Tang Sisi
    Wan Yaping
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2020), 2020, : 368 - 372
  • [5] Interpretable Neural Predictions with Differentiable Binary Variables
    Bastings, Joost
    Aziz, Wilker
    Titov, Ivan
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2963 - 2977
  • [6] BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
    Gouws, Stephan
    Bengio, Yoshua
    Corrado, Greg
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 748 - 756
  • [7] Finding Syntactic Representations in Neural Stacks
    Merrill, William
    Khazan, Lenny
    Amsel, Noah
    Hao, Yiding
    Mendelsohn, Simon
    Frank, Robert
    [J]. BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 224 - 232
  • [8] Identifying Granger causal relationships between neural power dynamics and variables of interest
    Winkler, Irene
    Haufe, Stefan
    Porbadnigk, Anne K.
    Mueller, Klaus-Robert
    Daehne, Sven
    [J]. NEUROIMAGE, 2015, 111 : 489 - 504
  • [9] Finding Distributed Needles in Neural Haystacks
    Cox, Christopher R.
    Rogers, Timothy T.
    [J]. JOURNAL OF NEUROSCIENCE, 2021, 41 (05): : 1019 - 1032
  • [10] Finding optimal interaction interface alignments between biological complexes
    Cui, Xuefeng
    Naveed, Hammad
    Gao, Xin
    [J]. BIOINFORMATICS, 2015, 31 (12) : 133 - 141