Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

被引:1
|
作者
Amirova, Aliya [1 ]
Fteropoulli, Theodora [2 ]
Ahmed, Nafiso [3 ]
Cowie, Martin R. [4 ,5 ]
Leibo, Joel Z. [6 ,7 ]
机构
[1] Kings Coll London, Fac Life Sci & Med, Sch Life Course & Populat Sci, Populat Hlth Sci, London, England
[2] Univ Cyprus, Med Sch, Nicosia, Cyprus
[3] UCL, Div Psychiat, London, England
[4] Royal Brompton Hosp, London, England
[5] Kings Coll London, Fac Life Sci & Med, Sch Cardiovasc Med & Sci, London, England
[6] Google DeepMind, London, England
[7] Kings Coll London, Fac Nat Math & Engn Sci, Dept Informat, London, England
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
WATER GAS-EXCHANGE; CARBON-DIOXIDE; MASS-TRANSFER; BOREAL ZONE; BROWN-WATER; RAIN; SURFACE; WIND; DISSIPATION; TURBULENCE;
D O I
10.1371/journal.pone.0300024
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
引用
收藏
页数:33
相关论文
共 50 条
  • [41] Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
    Zhang, Weigang
    Zhou, Biyu
    Wu, Xing
    Gao, Chaochen
    Liu, Zhibing
    Tang, Xuehai
    Li, Ruixuan
    Han, Jizhong
    Hu, Songlin
    EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 424 - 438
  • [42] A Framework for Enhancing Statute Law Retrieval Using Large Language Models
    Pham, Trang Ngoc Anh
    Do, Dinh-Truong
    Nguyen, Minh Le
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 247 - 259
  • [43] UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models
    Liu, Qi
    He, Yongyi
    Xu, Tong
    Lian, Defu
    Liu, Che
    Zheng, Zhi
    Chen, Enhong
    International Conference on Information and Knowledge Management, Proceedings, : 1909 - 1919
  • [44] FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models
    Zhang, Huaiwen
    Chen, Yu
    Wang, Ming
    Feng, Shi
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 96 - 107
  • [45] A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework
    Hamid, Rida
    Brohi, Sarfraz
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
  • [46] Evaluating Large Language Models for Enhanced Fuzzing: An Analysis Framework for LLM-Driven Seed Generation
    Black, Gavin
    Vaidyan, Varghese Mathew
    Comert, Gurcan
    IEEE ACCESS, 2024, 12 : 156065 - 156081
  • [47] Exploring the Performance of Large Language Models for Data Analysis Tasks Through the CRISP-DM Framework
    Musazade, Nurlan
    Mezei, Jozsef
    Wang, Xiaolu
    GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024, 2024, 989 : 56 - 65
  • [48] Leveraging large language models for tourism research based on 5D framework: A collaborative analysis of tourist sentiments and spatial features
    Rui, Jin
    Xu, Yuhan
    Cai, Chenfan
    Li, Xiang
    TOURISM MANAGEMENT, 2025, 108
  • [49] Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests
    Hellas, Arto
    Leinonen, Juho
    Sarsa, Sami
    Koutcheme, Charles
    Kujanpaa, Lilja
    Sorva, Juha
    PROCEEDINGS OF THE 2023 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH V.1, ICER 2023 V1, 2023, : 93 - 105
  • [50] Evaluating the Accuracy of Responses by Large Language Models for Information on Disease Epidemiology
    Zhu, Kexin
    Zhang, Jiajie
    Klishin, Anton
    Esser, Mario
    Blumentals, William A.
    Juhaeri, Juhaeri
    Jouquelet-Royer, Corinne
    Sinnott, Sarah-Jo
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2025, 34 (02)