Evaluating Creativity: Can LLMs Be Good Evaluators in Creative Writing Tasks?

被引:0
|
作者
Kim, Sungeun [1 ]
Oh, Dongsuk [1 ]
机构
[1] Kyungpook Natl Univ, Dept English Language & Literature, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
关键词
large language models (LLMs) evaluation; creative writing evaluation; creativity; AI evaluation; human evaluation;
D O I
10.3390/app15062971
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The evaluation of creative writing has long been a complex and subjective process, made even more intriguing by the rise of advanced Artificial Intelligence (AI) tools like Large Language Models (LLMs). This study evaluates the potential of LLMs as reliable and consistent evaluators of creative texts, directly comparing their performance with traditional human evaluations. The analysis focuses on key creative criteria, including fluency, flexibility, elaboration, originality, usefulness, and specific creativity strategies. Results demonstrate that LLMs provide consistent and objective evaluations, achieving higher Inter-Annotator Agreement (IAA) compared with human evaluators. However, LLMs face limitations in recognizing nuanced, culturally specific, and context-dependent aspects of creativity. Conversely, human evaluators, despite lower consistency and higher subjectivity, exhibit strengths in capturing deeper contextual insights. These findings highlight the need for the further refinement of LLMs to address the complexities of creative writing evaluation.
引用
收藏
页数:19
相关论文
共 50 条
  • [2] CREATIVITY, POETRY, AND CREATIVE WRITING
    LADEVICH, L
    LANGUAGE ARTS, 1978, 55 (03) : 388 - 392
  • [3] A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
    Gomez-Rodriguez, Carlos
    Williams, Paul
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14504 - 14528
  • [4] Creativity: Psychoanalysis, surrealism and creative writing
    White, TA
    WESTERLY, 1998, 43 (03): : 140 - 142
  • [5] Evaluating the Performance of LLMs on Technical Language Processing Tasks
    Kernycky, Andrew
    Coleman, David
    Spence, Christopher
    Das, Udayan
    HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 75 - 85
  • [6] I Smell Creativity: Exploring the Effects of Olfactory and Auditory Cues to Support Creative Writing Tasks
    Goncalves, Frederica
    Cabral, Diogo
    Campos, Pedro
    Schoening, Johannes
    HUMAN-COMPUTER INTERACTION - INTERACT 2017, PT II, 2017, 10514 : 165 - 183
  • [7] Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs
    Liu, Yu
    Li, Duantengchuan
    Wang, Kaili
    Xiong, Zhuoran
    Shi, Fobo
    Wang, Jian
    Li, Bing
    Hang, Bo
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)
  • [8] Can Scientific Writing Be Creative?
    Mehrdad Massoudi
    Journal of Science Education and Technology, 2003, 12 (2) : 115 - 128
  • [9] Can creative writing really be taught?
    Hedengren, Mary
    NEW WRITING-THE INTERNATIONAL JOURNAL FOR THE PRACTICE AND THEORY OF CREATIVE WRITING, 2018, 15 (01): : 124 - 127
  • [10] What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?
    Gao, Shuzheng
    Wen, Xin-Cheng
    Gao, Cuiyun
    Wang, Wenxuan
    Zhang, Hongyu
    Lyu, Michael R.
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 761 - 773