Interpreting Pretrained Language Models via Concept Bottlenecks

被引：0

作者：

Tan, Zhen ^{[1
]}

Cheng, Lu ^{[2
]}

Wang, Song ^{[3
]}

Yuan, Bo ^{[4
]}

Li, Jundong ^{[3
]}

Liu, Huan ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85281 USA

[2] Univ Illinois, Chicago, IL USA

[3] Univ Virginia, Charlottesville, VA USA

[4] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷

关键词：

Language Models; Interpretability; Conceptual Learning;

D O I：

10.1007/978-981-97-2259-4_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. However, the lack of interpretability due to their "black-box" nature poses challenges for responsible implementation. Although previous studies have attempted to improve interpretability by using, e.g., attention weights in self-attention layers, these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of "Food" and investigate how it influences the prediction of a model's sentiment towards a restaurant review. We introduce C3 M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we show that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.

引用

下载

页码：56 / 74

页数：19

共 50 条

[31] Probing Pretrained Language Models for Semantic Attributes and their Values
Beloucif, Meriem
Biemann, Chris
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2554 - 2559
[32] Are ChatGPT and other pretrained language models good parasitologists?
Slapeta, Jan
TRENDS IN PARASITOLOGY, 2023, 39 (05) : 314 - 316
[33] Developing Pretrained Language Models for Turkish Biomedical Domain
Turkmen, Hazal
Dikenelli, Oguz
Eraslan, Cenk
Calli, Mehmet Cem
Ozbek, Suha Sureyya
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 597 - 598
[34] Efficient Hierarchical Domain Adaptation for Pretrained Language Models
Chronopoulou, Alexandra
Peters, Matthew E.
Dodge, Jesse
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1336 - 1351
[35] TaleBrush: Sketching Stories with Generative Pretrained Language Models
Chung, John Joon Young
Kim, Wooseok
Yoo, Kang Min
Lee, Hwaran
Adar, Eytan
Chang, Minsuk
PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
[36] Progressive Generation of Long Text with Pretrained Language Models
Tan, Bowen
Yang, Zichao
Al-Shedivat, Maruan
Xing, Eric P.
Hu, Zhiting
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4313 - 4324
[37] Probing for Predicate Argument Structures in Pretrained Language Models
Conia, Simone
Navigli, Roberto
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4622 - 4632
[38] A Survey on Model Compression and Acceleration for Pretrained Language Models
Xu, Canwen
McAuley, Julian
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10566 - 10575
[39] Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models
Lin Y.-C.
Hoffmann P.
Rahm E.
SN Computer Science, 3 (5)
[40] Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
Yang, Yue
Panagopoulou, Artemis
Zhou, Shenghao
Jin, Daniel
Callison-Burch, Chris
Yatskar, Mark
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19187 - 19197

← 1 2 3 4 5 →