Explainability for Large Language Models: A Survey

被引：18

作者：

Zhao, Haiyan ^{[1
]}

Chen, Hanjie ^{[2
]}

Yang, Fan ^{[3
]}

Liu, Ninghao ^{[4
]}

Deng, Huiqi ^{[5
]}

Cai, Hengyi ^{[6
]}

Wang, Shuaiqiang ^{[7
]}

Yin, Dawei ^{[7
]}

Du, Mengnan ^{[1
]}

机构：

[1] New Jersey Inst Technol, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 USA

[2] Johns Hopkins Univ, 3400 N Charles St, Baltimore, MD 21218 USA

[3] Wake Forest Univ, 1834 Wake Forest Rd, Winston Salem, NC 27109 USA

[4] Univ Georgia, Herty Dr, Athens, GA 30602 USA

[5] Shanghai Jiao Tong Univ, 800 Dongchuan RD, Shanghai 200240, Peoples R China

[6] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[7] 10 Shangdi 10th St, Beijing 100085, Peoples R China

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2024年 / 15卷 / 02期

关键词：

Explainability; interpretability; large language models;

D O I：

10.1145/3639372

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this article, we introduce a taxonomy of explainability techniques and provide a structured overview ofmethods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional deep learning models.

引用

页数：38

共 50 条

[1] Large Language Models in Finance: A Survey
Li, Yinheng
Wang, Shaofei
Ding, Han
Chen, Hang
[J]. PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 374 - 382
[2] Large language models in law: A survey
Lai, Jinqi
Gan, Wensheng
Wu, Jiayang
Qi, Zhenlian
Yu, Philip S.
[J]. AI Open, 2024, 5 : 181 - 196
[3] A survey on LoRA of large language models
Mao, Yuren
Ge, Yuhang
Fan, Yijiang
Xu, Wenyi
Mi, Yu
Hu, Zhonghao
Gao, Yunjun
[J]. Frontiers of Computer Science, 2025, 19 (07)
[4] A survey on large language models for recommendation
Wu, Likang
Zheng, Zhi
Qiu, Zhaopeng
Wang, Hao
Gu, Hongchao
Shen, Tingjia
Qin, Chuan
Zhu, Chen
Zhu, Hengshu
Liu, Qi
Xiong, Hui
Chen, Enhong
[J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
[5] Large language models for medicine: a survey
Zheng, Yanxin
Gan, Wensheng
Chen, Zefeng
Qi, Zhenlian
Liang, Qian
Yu, Philip S.
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
[6] A Survey on Evaluation of Large Language Models
Chang, Yupeng
Wang, Xu
Wang, Jindong
Wu, Yuan
Yang, Linyi
Zhu, Kaijie
Chen, Hao
Yi, Xiaoyuan
Wang, Cunxiang
Wang, Yidong
Ye, Wei
Zhang, Yue
Chang, Yi
Yu, Philip S.
Yang, Qiang
Xie, Xing
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
[7] On the Reliability and Explainability of Language Models for Program Generation
Liu, Yue
Tantithamthavorn, Chakkrit
Liu, Yonghui
Li, Li
[J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (05)
[8] Jailbreak Attack for Large Language Models: A Survey
Li, Nan
Ding, Yidong
Jiang, Haoyu
Niu, Jiafei
Yi, Ping
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1156 - 1181
[9] Privacy issues in Large Language Models: A survey
Kibriya, Hareem
Khan, Wazir Zada
Siddiqa, Ayesha
Khan, Muhammad Khurrum
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
[10] On the Explainability of Natural Language Processing Deep Models
El Zini, Julia
Awad, Mariette
[J]. ACM COMPUTING SURVEYS, 2023, 55 (05)

← 1 2 3 4 5 →