Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models

被引:59
|
作者
Organisciak, Peter [1 ,4 ]
Acar, Selcuk [2 ]
Dumas, Denis [3 ]
Berthiaume, Kelly [2 ]
机构
[1] Univ Denver, Denver, CO USA
[2] Univ North Texas, Denton, TX 76203 USA
[3] Univ Georgia, Athens, GA USA
[4] Univ Denver, Dept Res Methods & Informat Sci, 1999 E Evans Ave, Denver, CO 80208 USA
关键词
Divergent thinking; Alternate uses test; Large -language models; Automated scoring; TORRANCE TESTS; CREATIVE-THINKING; ORDER; ACHIEVEMENT; IDEAS; TIME;
D O I
10.1016/j.tsc.2023.101356
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Automated scoring for divergent thinking (DT) seeks to overcome a key obstacle to creativity measurement: the effort, cost, and reliability of scoring open-ended tests. For a common test of DT, the Alternate Uses Task (AUT), the primary automated approach casts the problem as a semantic distance between a prompt and the resulting idea in a text model. This work presents an alternative approach that greatly surpasses the performance of the best existing semantic distance approaches. Our system, Ocsai, fine-tunes deep neural network-based large-language models (LLMs) on human-judged responses. Trained and evaluated against one of the largest collections of human-judged AUT responses, with 27 thousand responses collected from nine past studies, our fine-tuned large-language-models achieved up to r = 0.81 correlation with human raters, greatly surpassing current systems (r = 0.12-0.26). Further, learning transfers well to new test items and the approach is still robust with small numbers of training labels. We also compare prompt-based zero-shot and few-shot approaches, using GPT-3, ChatGPT, and GPT-4. This work also suggests a limit to the underlying assumptions of the semantic distance model, showing that a purely semantic approach that uses the stronger language representation of LLMs, while still improving on existing systems, does not achieve comparable improvements to our fine-tuned system. The increase in performance can support stronger applications and interventions in DT and opens the space of automated DT scoring to new areas for improving and understanding this branch of methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Automated Topic Analysis with Large Language Models
    Kirilenko, Andrei
    Stepchenkova, Svetlana
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2024, ENTER 2024, 2024, : 29 - 34
  • [22] The Machines Take Over: A Comparison of Various Supervised Learning Approaches for Automated Scoring of Divergent Thinking Tasks
    Buczak, Philip
    Huang, He
    Forthmann, Boris
    Doebler, Philipp
    JOURNAL OF CREATIVE BEHAVIOR, 2023, 57 (01): : 17 - 36
  • [23] BRAINTEASER: Lateral Thinking Puzzles for Large Language Models
    Jiang, Yifan
    Ilievski, Filip
    Ma, Kaixin
    Sourati, Zhivar
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14317 - 14332
  • [24] Semantic Mechanical Search with Large Vision and Language Models
    Sharma, Satvik
    Huang, Huang
    Shivakumar, Kaushik
    Chen, Lawrence Yunliang
    Hoque, Ryan
    Ichter, Brian
    Goldberg, Ken
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [25] LARGE MARGIN TRAINING IMPROVES LANGUAGE MODELS FOR ASR
    Wang, Jilin
    Huang, Jiaji
    Church, Kenneth Ward
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7368 - 7372
  • [26] Rationality of Thought Improves Reasoning in Large Language Models
    Gou, Tian
    Zhang, Boyao
    Sun, Zhenglie
    Wang, Jing
    Liu, Fang
    Wang, Yangang
    Wang, Jue
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 343 - 358
  • [27] Large language models direct automated chemistry laboratory
    Ana Laura Dias
    Tiago Rodrigues
    Nature, 2023, 624 : 530 - 531
  • [28] Automated Repair of Programs from Large Language Models
    National University of Singapore, Singapore
    不详
    不详
    arXiv, 1600,
  • [29] Leveraging Large Language Models for Automated Dialogue Analysis
    Finch, Sarah E.
    Paek, Ellie S.
    Choi, Jinho D.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215
  • [30] Large language models direct automated chemistry laboratory
    Dias, Ana Laura
    Rodrigues, Tiago
    NATURE, 2023, 624 (7992) : 530 - 531