Evaluating Creativity: Can LLMs Be Good Evaluators in Creative Writing Tasks?

被引:0
|
作者
Kim, Sungeun [1 ]
Oh, Dongsuk [1 ]
机构
[1] Kyungpook Natl Univ, Dept English Language & Literature, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
关键词
large language models (LLMs) evaluation; creative writing evaluation; creativity; AI evaluation; human evaluation;
D O I
10.3390/app15062971
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The evaluation of creative writing has long been a complex and subjective process, made even more intriguing by the rise of advanced Artificial Intelligence (AI) tools like Large Language Models (LLMs). This study evaluates the potential of LLMs as reliable and consistent evaluators of creative texts, directly comparing their performance with traditional human evaluations. The analysis focuses on key creative criteria, including fluency, flexibility, elaboration, originality, usefulness, and specific creativity strategies. Results demonstrate that LLMs provide consistent and objective evaluations, achieving higher Inter-Annotator Agreement (IAA) compared with human evaluators. However, LLMs face limitations in recognizing nuanced, culturally specific, and context-dependent aspects of creativity. Conversely, human evaluators, despite lower consistency and higher subjectivity, exhibit strengths in capturing deeper contextual insights. These findings highlight the need for the further refinement of LLMs to address the complexities of creative writing evaluation.
引用
收藏
页数:19
相关论文
共 50 条