A Field Guide to Automatic Evaluation of LLM-Generated Summaries

被引：0

作者：

van Schaik, Tempest A. ^{[1
]}

Pugh, Brittany ^{[1
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

Evaluation metrics; LLMs; summarization; offline evaluation;

D O I：

10.1145/3626772.3661346

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language models (LLMs) are rapidly being adopted for tasks such as text summarization, in a wide range of industries. This has driven the need for scalable, automatic, reliable, and cost-effective methods to evaluate the quality of LLM-generated text. What is meant by evaluating an LLM is not yet well defined and there are widely different expectations about what kind of information evaluation will produce. Evaluation methods that were developed for traditional Natural Language Processing (NLP) tasks (before the rise of LLMs) remain applicable but are not sufficient for capturing high-level semantic qualities of summaries. Emerging evaluation methods that use LLMs to evaluate LLM-output, appear to be powerful but lacking in reliability. New elements of LLM generated text that were not an element of previous NLP tasks, such as the artifacts of hallucination, need to be considered. We outline the different types of LLM evaluation currently used in the literature but focus on offline, system-level evaluation of the text generated by LLMs. Evaluating LLM-generated summaries is a complex and fast-evolving area, and we propose strategies for applying evaluation methods to avoid common pitfalls. Despite having promising strategies for evaluating LLM summaries, we highlight some open challenges that remain.

引用

页码：2832 / 2836

页数：5

共 50 条

[1] LLM-generated Explanations for Recommender Systems
Lubos, Sebastian
Tran, Thi Ngoc Trang
Felfernig, Alexander
Erdeniz, Seda Polat
Le, Viet-Man
ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 276 - 285
[2] The Science of Detecting LLM-Generated Text
Tang, Ruixiang
Chuang, Yu-Neng
Hu, Xia
COMMUNICATIONS OF THE ACM, 2024, 67 (04) : 47 - 56
[3] Caregiver's Evaluation of LLM-Generated Treatment Goals for Patients with SMI
James, Lorenzo J.
Genga, Laura
Montagne, Barbara
Hagenaars, Muriel A.
Van Gorp, Pieter M. E.
17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024, 2024, : 187 - 190
[4] Analyzing Students' Preferences for LLM-Generated Analogies
Bernstein, Seth
Denny, Paul
Leinonen, Juho
Littlefield, Matt
Hellas, Arto
MacNeil, Stephen
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 2, ITICSE 2024, 2024, : 812 - 812
[5] A Performance Study of LLM-Generated Code on Leetcode
Coignion, Tristan
Quinton, Clement
Rouvoy, Romain
PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 79 - 89
[6] Aligning Actions and Walking to LLM-Generated Textual Descriptions
Chivereanu, Radu
Cosma, Adrian
Catruna, Andy
Rughinis, Razvan
Radoi, Emilian
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
[7] Meaning by Courtesy: LLM-Generated Texts and the Illusion of Content
Ostertag, Gary
AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 91 - 93
[8] Lost in Transformation: Rediscovering LLM-Generated Campaigns in Social Media
Grimme, Britta
Pohl, Janina
Winkelmann, Hendrik
Stampe, Lucas
Grimme, Christian
DISINFORMATION IN OPEN ONLINE MEDIA, MISDOOM 2023, 2023, 14397 : 72 - 87
[9] Contrasting Linguistic Patterns in Human and LLM-Generated News Text
Munoz-Ortiz, Alberto
Gomez-Rodriguez, Carlos
Vilares, David
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
[10] AudiLens: Configurable LLM-Generated Audiences for Public Speech Practice
Park, Jeongeon
Choi, DaEun
ADJUNCT PROCEEDINGS OF THE 36TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2023 ADJUNCT, 2023,

← 1 2 3 4 5 →