Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models

被引:59
|
作者
Organisciak, Peter [1 ,4 ]
Acar, Selcuk [2 ]
Dumas, Denis [3 ]
Berthiaume, Kelly [2 ]
机构
[1] Univ Denver, Denver, CO USA
[2] Univ North Texas, Denton, TX 76203 USA
[3] Univ Georgia, Athens, GA USA
[4] Univ Denver, Dept Res Methods & Informat Sci, 1999 E Evans Ave, Denver, CO 80208 USA
关键词
Divergent thinking; Alternate uses test; Large -language models; Automated scoring; TORRANCE TESTS; CREATIVE-THINKING; ORDER; ACHIEVEMENT; IDEAS; TIME;
D O I
10.1016/j.tsc.2023.101356
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Automated scoring for divergent thinking (DT) seeks to overcome a key obstacle to creativity measurement: the effort, cost, and reliability of scoring open-ended tests. For a common test of DT, the Alternate Uses Task (AUT), the primary automated approach casts the problem as a semantic distance between a prompt and the resulting idea in a text model. This work presents an alternative approach that greatly surpasses the performance of the best existing semantic distance approaches. Our system, Ocsai, fine-tunes deep neural network-based large-language models (LLMs) on human-judged responses. Trained and evaluated against one of the largest collections of human-judged AUT responses, with 27 thousand responses collected from nine past studies, our fine-tuned large-language-models achieved up to r = 0.81 correlation with human raters, greatly surpassing current systems (r = 0.12-0.26). Further, learning transfers well to new test items and the approach is still robust with small numbers of training labels. We also compare prompt-based zero-shot and few-shot approaches, using GPT-3, ChatGPT, and GPT-4. This work also suggests a limit to the underlying assumptions of the semantic distance model, showing that a purely semantic approach that uses the stronger language representation of LLMs, while still improving on existing systems, does not achieve comparable improvements to our fine-tuned system. The increase in performance can support stronger applications and interventions in DT and opens the space of automated DT scoring to new areas for improving and understanding this branch of methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Lost in translation? Not for Large Language Models: Automated divergent thinking scoring performance translates to non-English contexts
    Zielinska, Aleksandra
    Organisciak, Peter
    Dumas, Denis
    Karwowski, Maciej
    THINKING SKILLS AND CREATIVITY, 2023, 50
  • [2] Automatic Scoring of Verbal Divergent Thinking Tests: From Lexical Databases to Large Language Models
    Valueva, E. A.
    Panfilova, A. S.
    Rafikova, A. S.
    PSYCHOLOGY-JOURNAL OF THE HIGHER SCHOOL OF ECONOMICS, 2024, 21 (01): : 202 - 225
  • [3] Leveraging Large Language Models for Automated Chinese Essay Scoring
    Feng, Haiyue
    Du, Sixuan
    Zhu, Gaoxia
    Zou, Yan
    Poh Boon Phua
    Feng, Yuhong
    Zhong, Haoming
    Shen, Zhiqi
    Liu, Siyuan
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 454 - 467
  • [4] Probing the "Creativity" of Large Language Models: Can Models Produce Divergent Semantic Association?
    Chen, Honghua
    Ding, Nai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12881 - 12888
  • [5] SCALM: Towards Semantic Caching for Automated Chat Services with Large Language Models
    Li, Jiaxing
    Xu, Chi
    Wang, Feng
    von Riedemann, Isaac M.
    Zhang, Cong
    Liu, Jiangchuan
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [6] Language models in automated essay scoring: Insights for the Turkish language
    Firoozi, Tahereh
    Bulut, Okan
    Gierl, Mark J.
    INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, 2023, 10 : 148 - 162
  • [7] Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability
    Pack A.
    Barrett A.
    Escalante J.
    Computers and Education: Artificial Intelligence, 2024, 6
  • [8] Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models
    Morris, Wesley
    Holmes, Langdon
    Choi, Joon Suh
    Crossley, Scott
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2024,
  • [9] Applying large language models for automated essay scoring for non-native Japanese
    Li, Wenchao
    Liu, Haitao
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
  • [10] Automated Essay Scoring and Revising Based on Open-Source Large Language Models
    Song, Yishen
    Zhu, Qianta
    Wang, Huaibo
    Zheng, Qinhua
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2024, 17 : 1920 - 1930