Interpreting Pretrained Language Models via Concept Bottlenecks

被引:0
|
作者
Tan, Zhen [1 ]
Cheng, Lu [2 ]
Wang, Song [3 ]
Yuan, Bo [4 ]
Li, Jundong [3 ]
Liu, Huan [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
[2] Univ Illinois, Chicago, IL USA
[3] Univ Virginia, Charlottesville, VA USA
[4] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
关键词
Language Models; Interpretability; Conceptual Learning;
D O I
10.1007/978-981-97-2259-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. However, the lack of interpretability due to their "black-box" nature poses challenges for responsible implementation. Although previous studies have attempted to improve interpretability by using, e.g., attention weights in self-attention layers, these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of "Food" and investigate how it influences the prediction of a model's sentiment towards a restaurant review. We introduce C3 M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we show that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.
引用
下载
收藏
页码:56 / 74
页数:19
相关论文
共 50 条
  • [31] Probing Pretrained Language Models for Semantic Attributes and their Values
    Beloucif, Meriem
    Biemann, Chris
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2554 - 2559
  • [33] Developing Pretrained Language Models for Turkish Biomedical Domain
    Turkmen, Hazal
    Dikenelli, Oguz
    Eraslan, Cenk
    Calli, Mehmet Cem
    Ozbek, Suha Sureyya
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 597 - 598
  • [34] Efficient Hierarchical Domain Adaptation for Pretrained Language Models
    Chronopoulou, Alexandra
    Peters, Matthew E.
    Dodge, Jesse
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1336 - 1351
  • [35] TaleBrush: Sketching Stories with Generative Pretrained Language Models
    Chung, John Joon Young
    Kim, Wooseok
    Yoo, Kang Min
    Lee, Hwaran
    Adar, Eytan
    Chang, Minsuk
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [36] Progressive Generation of Long Text with Pretrained Language Models
    Tan, Bowen
    Yang, Zichao
    Al-Shedivat, Maruan
    Xing, Eric P.
    Hu, Zhiting
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4313 - 4324
  • [37] Probing for Predicate Argument Structures in Pretrained Language Models
    Conia, Simone
    Navigli, Roberto
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4622 - 4632
  • [38] A Survey on Model Compression and Acceleration for Pretrained Language Models
    Xu, Canwen
    McAuley, Julian
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10566 - 10575
  • [39] Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models
    Lin Y.-C.
    Hoffmann P.
    Rahm E.
    SN Computer Science, 3 (5)
  • [40] Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
    Yang, Yue
    Panagopoulou, Artemis
    Zhou, Shenghao
    Jin, Daniel
    Callison-Burch, Chris
    Yatskar, Mark
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19187 - 19197