Explainability for Large Language Models: A Survey

被引:18
|
作者
Zhao, Haiyan [1 ]
Chen, Hanjie [2 ]
Yang, Fan [3 ]
Liu, Ninghao [4 ]
Deng, Huiqi [5 ]
Cai, Hengyi [6 ]
Wang, Shuaiqiang [7 ]
Yin, Dawei [7 ]
Du, Mengnan [1 ]
机构
[1] New Jersey Inst Technol, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 USA
[2] Johns Hopkins Univ, 3400 N Charles St, Baltimore, MD 21218 USA
[3] Wake Forest Univ, 1834 Wake Forest Rd, Winston Salem, NC 27109 USA
[4] Univ Georgia, Herty Dr, Athens, GA 30602 USA
[5] Shanghai Jiao Tong Univ, 800 Dongchuan RD, Shanghai 200240, Peoples R China
[6] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[7] 10 Shangdi 10th St, Beijing 100085, Peoples R China
关键词
Explainability; interpretability; large language models;
D O I
10.1145/3639372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this article, we introduce a taxonomy of explainability techniques and provide a structured overview ofmethods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional deep learning models.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] Large Language Models in Finance: A Survey
    Li, Yinheng
    Wang, Shaofei
    Ding, Han
    Chen, Hang
    [J]. PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 374 - 382
  • [2] Large language models in law: A survey
    Lai, Jinqi
    Gan, Wensheng
    Wu, Jiayang
    Qi, Zhenlian
    Yu, Philip S.
    [J]. AI Open, 2024, 5 : 181 - 196
  • [3] A survey on LoRA of large language models
    Mao, Yuren
    Ge, Yuhang
    Fan, Yijiang
    Xu, Wenyi
    Mi, Yu
    Hu, Zhonghao
    Gao, Yunjun
    [J]. Frontiers of Computer Science, 2025, 19 (07)
  • [4] A survey on large language models for recommendation
    Wu, Likang
    Zheng, Zhi
    Qiu, Zhaopeng
    Wang, Hao
    Gu, Hongchao
    Shen, Tingjia
    Qin, Chuan
    Zhu, Chen
    Zhu, Hengshu
    Liu, Qi
    Xiong, Hui
    Chen, Enhong
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
  • [5] Large language models for medicine: a survey
    Zheng, Yanxin
    Gan, Wensheng
    Chen, Zefeng
    Qi, Zhenlian
    Liang, Qian
    Yu, Philip S.
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [6] A Survey on Evaluation of Large Language Models
    Chang, Yupeng
    Wang, Xu
    Wang, Jindong
    Wu, Yuan
    Yang, Linyi
    Zhu, Kaijie
    Chen, Hao
    Yi, Xiaoyuan
    Wang, Cunxiang
    Wang, Yidong
    Ye, Wei
    Zhang, Yue
    Chang, Yi
    Yu, Philip S.
    Yang, Qiang
    Xie, Xing
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [7] On the Reliability and Explainability of Language Models for Program Generation
    Liu, Yue
    Tantithamthavorn, Chakkrit
    Liu, Yonghui
    Li, Li
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (05)
  • [8] Jailbreak Attack for Large Language Models: A Survey
    Li, Nan
    Ding, Yidong
    Jiang, Haoyu
    Niu, Jiafei
    Yi, Ping
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1156 - 1181
  • [9] Privacy issues in Large Language Models: A survey
    Kibriya, Hareem
    Khan, Wazir Zada
    Siddiqa, Ayesha
    Khan, Muhammad Khurrum
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
  • [10] On the Explainability of Natural Language Processing Deep Models
    El Zini, Julia
    Awad, Mariette
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (05)