Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation

被引:0
|
作者
Chhun, Cyril [1 ]
Suchanek, Fabian M. [1 ]
Clavel, Chloe [2 ]
机构
[1] Inst Polytech Paris, LTCI Telecom Paris, Paris, France
[2] INRIA Paris, ALMAnaCH, Paris, France
关键词
RELIABILITY;
D O I
10.1162/tacl_a_00689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Storytelling is an integral part of human experience and plays a crucial role in social interactions. Thus, Automatic Story Evaluation (ASE) and Generation (ASG) could benefit society in multiple ways, but they are challenging tasks which require high-level human abilities such as creativity, reasoning, and deep understanding. Meanwhile, Large Language Models (LLMs) now achieve state-of-the-art performance on many NLP tasks. In this paper, we study whether LLMs can be used as substitutes for human annotators for ASE. We perform an extensive analysis of the correlations between LLM ratings, other automatic measures, and human annotations, and we explore the influence of prompting on the results and the explainability of LLM behaviour. Most notably, we find that LLMs outperform current automatic measures for system-level evaluation but still struggle at providing satisfactory explanations for their answers.
引用
收藏
页码:1122 / 1142
页数:21
相关论文
共 50 条
  • [1] Considerations for Prompting Large Language Models
    Schulte, Brian
    [J]. JAMA ONCOLOGY, 2024, 10 (04) : 475 - 483
  • [2] Prompting Is Programming: A Query Language for Large Language Models
    Beurer-Kellner, Luca
    Fischer, Marc
    Vechev, Martin
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2023, 7 (PLDI):
  • [3] Graph Neural Prompting with Large Language Models
    Tian, Yijun
    Song, Huan
    Wang, Zichen
    Wang, Haozhu
    Hu, Ziqing
    Wang, Fang
    Chawla, Nitesh V.
    Xu, Panpan
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19080 - 19088
  • [4] Prompting Large Language Models With the Socratic Method
    Chang, Edward Y.
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 351 - 360
  • [5] Prompting Large Language Models to Power Educational Chatbots
    Farah, Juan Carlos
    Ingram, Sandy
    Spaenlehauer, Basile
    Lasne, Fanny Kim-Lan
    Gillet, Denis
    [J]. ADVANCES IN WEB-BASED LEARNING, ICWL 2023, 2023, 14409 : 169 - 188
  • [6] Editing Graph Visualizations by Prompting Large Language Models
    Argyriou, Evmorfia
    Boehm, Jens
    Eberle, Anne
    Gonser, Julius
    Lumpp, Anna-Lena
    Niedermann, Benjamin
    Schwarzkopf, Fabian
    [J]. GRAPH DRAWING AND NETWORK VISUALIZATION, GD 2023, PT II, 2023, 14466 : 253 - 254
  • [7] Considerations for Prompting Large Language Models-Reply
    Chen, Shan
    Savova, Guergana K.
    Bitterman, Danielle S.
    [J]. JAMA ONCOLOGY, 2024, 10 (04) : 526 - 530
  • [8] Grammar Prompting for Domain-Specific Language Generation with Large Language Models
    Wang, Bailin
    Wang, Zi
    Wang, Xuezhi
    Cao, Yuan
    Saurous, Rif A.
    Kim, Yoon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Improving Automatic VQA Evaluation Using Large Language Models
    Manas, Oscar
    Krojer, Benno
    Agrawal, Aishwarya
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4171 - 4179
  • [10] Automatic Lesson Plan Generation via Large Language Models with Self-critique Prompting
    Zheng, Ying
    Li, Xueyi
    Huang, Yaying
    Liang, Qianru
    Guo, Teng
    Hou, Mingliang
    Gao, Boyu
    Tian, Mi
    Liu, Zitao
    Luo, Weiqi
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 163 - 178