Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

被引:0
|
作者
Finlayson, Matthew [1 ]
Mueller, Aaron [2 ]
Gehrmann, Sebastian [3 ]
Shieber, Stuart [1 ]
Linzen, Tal [4 ]
Belinkov, Yonatan [5 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Johns Hopkins Univ, Baltimore, MD USA
[3] Google Res, New York, NY USA
[4] NYU, New York, NY USA
[5] Technion IIT, Haifa, Israel
基金
以色列科学基金会; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether neurons process subject-verb agreement similarly across sentences with different syntactic structures. We uncover similarities and differences across architectures and model sizes-notably, that larger models do not necessarily learn stronger preferences. We also observe two distinct mechanisms for producing subject-verb agreement depending on the syntactic structure of the input sentence. Finally, we find that language models rely on similar sets of neurons when given sentences with similar syntactic structure.
引用
收藏
页码:1828 / 1843
页数:16
相关论文
共 50 条
  • [41] Multimodal Neural Language Models
    Kiros, Ryan
    Salakhutdinov, Ruslan
    Zemel, Richard
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 595 - 603
  • [42] Dynamic Neural Language Models
    Delasalles, Edouard
    Lamprier, Sylvain
    Denoyer, Ludovic
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 282 - 294
  • [43] Neural mechanisms of the continued influence effect of misinformation: Analysis based on fMRI causal connectivity
    Jia, Lina
    Jin, Hua
    Jin, Xiaokang
    [J]. NEUROSCIENCE LETTERS, 2024, 836
  • [44] Dissociable Neural Mechanisms for Human Inference Processing Predicted by Static and Contextual Language Models
    Uchida, Takahisa
    Lair, Nicolas
    Ishiguro, Hiroshi
    Dominey, Peter Ford
    [J]. NEUROBIOLOGY OF LANGUAGE, 2024, 5 (01): : 248 - 263
  • [45] Czech Grammar Agreement Dataset for Evaluation of Language Models
    Baisa, Vit
    [J]. RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2016), 2016, : 63 - 67
  • [46] Statistical Models for Causal Analysis
    Koufteros, Xenophon A.
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 1996, 3 (03) : 300 - 302
  • [47] Neural mechanisms of language development in infancy
    Huberty, Scott
    O'Reilly, Christian
    Leno, Virginia Carter
    Steiman, Mandy
    Webb, Sara
    Elsabbagh, Mayada
    [J]. INFANCY, 2023, 28 (04) : 754 - 770
  • [48] Language and Cognition Interaction Neural Mechanisms
    Perlovsky, Leonid
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2011, 2011
  • [49] Syntactic processing in language and music: Theoretical and neural similarities and differences.
    Patel, AD
    [J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 1999, : 47 - 47
  • [50] Neural Mechanisms of Syntactic Movement: An ERPs Study of Chinese Passive Sentences
    Tao, Liu
    Huo, Jiang
    [J]. YUYAN KEXUE-LINGUISTIC SCIENCES, 2016, 15 (06): : 612 - 624