Interpreting Pretrained Language Models via Concept Bottlenecks

被引：0

作者：

Tan, Zhen ^{[1
]}

Cheng, Lu ^{[2
]}

Wang, Song ^{[3
]}

Yuan, Bo ^{[4
]}

Li, Jundong ^{[3
]}

Liu, Huan ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85281 USA

[2] Univ Illinois, Chicago, IL USA

[3] Univ Virginia, Charlottesville, VA USA

[4] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷

关键词：

Language Models; Interpretability; Conceptual Learning;

D O I：

10.1007/978-981-97-2259-4_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. However, the lack of interpretability due to their "black-box" nature poses challenges for responsible implementation. Although previous studies have attempted to improve interpretability by using, e.g., attention weights in self-attention layers, these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of "Food" and investigate how it influences the prediction of a model's sentiment towards a restaurant review. We introduce C3 M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we show that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.

引用

页码：56 / 74

页数：19

共 50 条

[1] Data Augmentation for Spoken Language Understanding via Pretrained Language Models
Peng, Baolin
Zhu, Chenguang
Zeng, Michael
Gao, Jianfeng
[J]. INTERSPEECH 2021, 2021, : 1219 - 1223
[2] A Survey of Pretrained Language Models
Sun, Kaili
Luo, Xudong
Luo, Michael Y.
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 442 - 456
[3] Generating Datasets with Pretrained Language Models
Schick, Timo
Schuetze, Hinrich
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6943 - 6951
[4] Geographic Adaptation of Pretrained Language Models
Hofmann, Valentin
Glavas, Goran
Ljubesic, Nikola
Pierrehumbert, Janet B.
Schuetze, Hinrich
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 411 - 431
[5] Discourse Probing of Pretrained Language Models
Koto, Fajri
Lau, Jey Han
Baldwin, Timothy
[J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3849 - 3864
[6] Investigating Transferability in Pretrained Language Models
Tamkin, Alex
Singh, Trisha
Giovanardi, Davide
Goodman, Noah
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1393 - 1401
[7] Textually Pretrained Speech Language Models
Hassid, Michael
Remez, Tal
Nguyen, Tu Anh
Gat, Itai
Conneau, Alexis
Kreuk, Felix
Copet, Jade
Defossez, Alexandre
Synnaeve, Gabriel
Dupoux, Emmanuel
Schwartz, Roy
Adi, Yossi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Unsupervised Paraphrasing with Pretrained Language Models
Niu, Tong
Yavuz, Semih
Zhou, Yingbo
Keskar, Nitish Shirish
Wang, Huan
Xiong, Caiming
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5136 - 5150
[9] Measuring and Improving Consistency in Pretrained Language Models
Elazar, Yanai
Kassner, Nora
Ravfogel, Shauli
Ravichander, Abhilasha
Hovy, Eduard
Schutze, Hinrich
Goldberg, Yoav
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1012 - 1031
[10] Language Recognition Based on Unsupervised Pretrained Models
Yu, Haibin
Zhao, Jing
Yang, Song
Wu, Zhongqin
Nie, Yuting
Zhang, Wei-Qiang
[J]. INTERSPEECH 2021, 2021, : 3271 - 3275

← 1 2 3 4 5 →