Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

被引:1
|
作者
Amirova, Aliya [1 ]
Fteropoulli, Theodora [2 ]
Ahmed, Nafiso [3 ]
Cowie, Martin R. [4 ,5 ]
Leibo, Joel Z. [6 ,7 ]
机构
[1] Kings Coll London, Fac Life Sci & Med, Sch Life Course & Populat Sci, Populat Hlth Sci, London, England
[2] Univ Cyprus, Med Sch, Nicosia, Cyprus
[3] UCL, Div Psychiat, London, England
[4] Royal Brompton Hosp, London, England
[5] Kings Coll London, Fac Life Sci & Med, Sch Cardiovasc Med & Sci, London, England
[6] Google DeepMind, London, England
[7] Kings Coll London, Fac Nat Math & Engn Sci, Dept Informat, London, England
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
WATER GAS-EXCHANGE; CARBON-DIOXIDE; MASS-TRANSFER; BOREAL ZONE; BROWN-WATER; RAIN; SURFACE; WIND; DISSIPATION; TURBULENCE;
D O I
10.1371/journal.pone.0300024
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
引用
收藏
页数:33
相关论文
共 50 条
  • [21] Leveraging large language models for generating responses to patient messages-a subjective analysis
    Liu, Siru
    Mccoy, Allison B.
    Wright, Aileen P.
    Carew, Babatunde
    Genkins, Julian Z.
    Huang, Sean S.
    Peterson, Josh F.
    Steitz, Bryan
    Wright, Adam
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (06) : 1367 - 1379
  • [22] The Haptic Fidelity Framework: A Qualitative Overview and Categorization of Cutaneous-Based Haptic Technologies Through Fidelity
    Breitschaft, Stefan Josef
    Heijboer, Stefan
    Shor, Daniel
    Tempelman, Erik
    Vink, Peter
    Carbon, Claus-Christian
    IEEE TRANSACTIONS ON HAPTICS, 2022, 15 (02) : 232 - 245
  • [23] "Conversing" With Qualitative Data: Enhancing Qualitative Research Through Large Language Models (LLMs)
    Hayes, Adam S.
    INTERNATIONAL JOURNAL OF QUALITATIVE METHODS, 2025, 24
  • [24] Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
    Ko, Hyung-Kwon
    Jeon, Hyeon
    Park, Gwanmo
    Kim, Dae Hyun
    Kim, Nam Wook
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [25] Framework for evaluating code generation ability of large language models
    Yeo, Sangyeop
    Ma, Yu-Seung
    Kim, Sang Cheol
    Jun, Hyungkook
    Kim, Taeho
    ETRI JOURNAL, 2024, 46 (01) : 106 - 117
  • [26] IterClean: An Iterative Data Cleaning Framework with Large Language Models
    Ni, Wei
    Zhang, Kaihang
    Miao, Xiaoye
    Zhao, Xiangyu
    Wu, Yangyang
    Yin, Jianwei
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 100 - 105
  • [27] A hybrid framework with large language models for rare disease phenotyping
    Wu, Jinge
    Dong, Hang
    Li, Zexi
    Wang, Haowei
    Li, Runci
    Patra, Arijit
    Dai, Chengliang
    Ali, Waqar
    Scordis, Phil
    Wu, Honghan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [28] Feature based Algorithmic Analysis on American Sign Language Dataset
    Butt, Umair Muneer
    Husnain, Basharat
    Ahmed, Usman
    Tariq, Arslan
    Tariq, Iqra
    Butt, Muhammad Aadil
    Zia, Muhammad Sultan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 583 - 589
  • [29] Pyrolysis-free covalent organic framework-based materials for efficient oxygen electrocatalysis
    Cui, Xun
    Gao, Likun
    Ma, Rui
    Wei, Zhengnan
    Lu, Cheng-Hsin
    Li, Zili
    Yang, Yingkui
    JOURNAL OF MATERIALS CHEMISTRY A, 2021, 9 (37) : 20985 - 21004
  • [30] Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding
    Xiao, Ziang
    Yuan, Xingdi
    Liao, Q. Vera
    Abdelghani, Rania
    Oudeyer, Pierre-Yves
    COMPANION PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023 COMPANION, 2023, : 75 - 78