A Field Guide to Automatic Evaluation of LLM-Generated Summaries

被引:0
|
作者
van Schaik, Tempest A. [1 ]
Pugh, Brittany [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
关键词
Evaluation metrics; LLMs; summarization; offline evaluation;
D O I
10.1145/3626772.3661346
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language models (LLMs) are rapidly being adopted for tasks such as text summarization, in a wide range of industries. This has driven the need for scalable, automatic, reliable, and cost-effective methods to evaluate the quality of LLM-generated text. What is meant by evaluating an LLM is not yet well defined and there are widely different expectations about what kind of information evaluation will produce. Evaluation methods that were developed for traditional Natural Language Processing (NLP) tasks (before the rise of LLMs) remain applicable but are not sufficient for capturing high-level semantic qualities of summaries. Emerging evaluation methods that use LLMs to evaluate LLM-output, appear to be powerful but lacking in reliability. New elements of LLM generated text that were not an element of previous NLP tasks, such as the artifacts of hallucination, need to be considered. We outline the different types of LLM evaluation currently used in the literature but focus on offline, system-level evaluation of the text generated by LLMs. Evaluating LLM-generated summaries is a complex and fast-evolving area, and we propose strategies for applying evaluation methods to avoid common pitfalls. Despite having promising strategies for evaluating LLM summaries, we highlight some open challenges that remain.
引用
收藏
页码:2832 / 2836
页数:5
相关论文
共 50 条
  • [1] LLM-generated Explanations for Recommender Systems
    Lubos, Sebastian
    Tran, Thi Ngoc Trang
    Felfernig, Alexander
    Erdeniz, Seda Polat
    Le, Viet-Man
    ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 276 - 285
  • [2] The Science of Detecting LLM-Generated Text
    Tang, Ruixiang
    Chuang, Yu-Neng
    Hu, Xia
    COMMUNICATIONS OF THE ACM, 2024, 67 (04) : 47 - 56
  • [3] Caregiver's Evaluation of LLM-Generated Treatment Goals for Patients with SMI
    James, Lorenzo J.
    Genga, Laura
    Montagne, Barbara
    Hagenaars, Muriel A.
    Van Gorp, Pieter M. E.
    17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024, 2024, : 187 - 190
  • [4] Analyzing Students' Preferences for LLM-Generated Analogies
    Bernstein, Seth
    Denny, Paul
    Leinonen, Juho
    Littlefield, Matt
    Hellas, Arto
    MacNeil, Stephen
    PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 2, ITICSE 2024, 2024, : 812 - 812
  • [5] A Performance Study of LLM-Generated Code on Leetcode
    Coignion, Tristan
    Quinton, Clement
    Rouvoy, Romain
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 79 - 89
  • [6] Aligning Actions and Walking to LLM-Generated Textual Descriptions
    Chivereanu, Radu
    Cosma, Adrian
    Catruna, Andy
    Rughinis, Razvan
    Radoi, Emilian
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [7] Meaning by Courtesy: LLM-Generated Texts and the Illusion of Content
    Ostertag, Gary
    AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 91 - 93
  • [8] Lost in Transformation: Rediscovering LLM-Generated Campaigns in Social Media
    Grimme, Britta
    Pohl, Janina
    Winkelmann, Hendrik
    Stampe, Lucas
    Grimme, Christian
    DISINFORMATION IN OPEN ONLINE MEDIA, MISDOOM 2023, 2023, 14397 : 72 - 87
  • [9] Contrasting Linguistic Patterns in Human and LLM-Generated News Text
    Munoz-Ortiz, Alberto
    Gomez-Rodriguez, Carlos
    Vilares, David
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [10] AudiLens: Configurable LLM-Generated Audiences for Public Speech Practice
    Park, Jeongeon
    Choi, DaEun
    ADJUNCT PROCEEDINGS OF THE 36TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2023 ADJUNCT, 2023,