Survey of Hallucination in Natural Language Generation

被引:504
|
作者
Ji, Ziwei [1 ]
Lee, Nayeon [1 ]
Frieske, Rita [1 ]
Yu, Tiezheng [1 ]
Su, Dan [1 ]
Xu, Yan [1 ]
Ishii, Etsuko [1 ]
Bang, Ye Jin [1 ]
Madotto, Andrea [1 ]
Fung, Pascale [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Ctr Artificial Intelligence Res CAiRE, Room 2602A,Clear Water Bay, Hong Kong, Peoples R China
关键词
Hallucination; intrinsic hallucination; extrinsic hallucination; faithfulness; in NLG; factuality in NLG; consistency in NLG;
D O I
10.1145/3571730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based languagemodels. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation, and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions, and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, andmachine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] A Survey of Natural Language Generation
    Dong, Chenhe
    Li, Yinghui
    Gong, Haifan
    Chen, Miaoxin
    Li, Junxin
    Shen, Ying
    Yang, Min
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (08)
  • [2] LEXICALIZATION IN NATURAL-LANGUAGE GENERATION - A SURVEY
    STEDE, M
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 1994, 8 (04) : 309 - 336
  • [3] On Hallucination and Predictive Uncertainty in Conditional Language Generation
    Xiao, Yijun
    Wang, William Yang
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2734 - 2744
  • [4] A Survey of Automatic Code Generation from Natural Language
    Shin, Jiho
    Nam, Jaechang
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (03): : 537 - 555
  • [5] Natural Language Generation Using Sequential Models: A Survey
    Pandey, Abhishek Kumar
    Roy, Sanjiban Sekhar
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7709 - 7742
  • [6] Natural Language Generation Using Sequential Models: A Survey
    Abhishek Kumar Pandey
    Sanjiban Sekhar Roy
    [J]. Neural Processing Letters, 2023, 55 : 7709 - 7742
  • [7] Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning
    Erdem, Erkut
    Kuyu, Menekse
    Yagcioglu, Semih
    Frank, Anette
    Parcalabescu, Letitia
    Babii, Andrii
    Turuta, Oleksii
    Erdem, Aykut
    Calixto, Iacer
    Plank, Barbara
    Lloret, Elena
    Apostol, Elena-Simona
    Truicǎ, Ciprian-Octavian
    Šandrih, Branislava
    Martinčić-Ipšić, Sanda
    Berend, Gábor
    Gatt, Albert
    Korvel, Gražina
    [J]. Journal of Artificial Intelligence Research, 2022, 73 : 1131 - 1207
  • [8] Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning
    Erdem, Erkut
    Kuyu, Menekse
    Yagcioglu, Semih
    Frank, Anette
    Parcalabescu, Letitia
    Babii, Andrii
    Turuta, Oleksii
    Erdem, Aykut
    Calixto, Lacer
    Plank, Barbara
    Lloret, Elena
    Apostol, Elena-Simona
    Truica, Ciprian-Octavian
    Sandrih, Branislava
    Martincic-Ipsic, Sanda
    Berend, Gabor
    Gatt, Albert
    Korvel, Grazina
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 1131 - 1207
  • [9] RECENT ADVANCES IN NATURAL LANGUAGE GENERATION: A SURVEY AND CLASSIFICATION OF THE EMPIRICAL LITERATURE
    Perera, Rivindu
    Nand, Parma
    [J]. COMPUTING AND INFORMATICS, 2017, 36 (01) : 1 - 32
  • [10] Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
    Fernandes, Patrick
    Madaan, Aman
    Liu, Emmy
    Farinhas, Antonio
    Martins, Pedro Henrique
    Bertsch, Amanda
    de Souza, Jose G. C.
    Zhou, Shuyan
    Wu, Tongshuang
    Neubig, Graham
    Martins, Andre F. T.
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1643 - 1668