Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

被引:1
|
作者
Amirova, Aliya [1 ]
Fteropoulli, Theodora [2 ]
Ahmed, Nafiso [3 ]
Cowie, Martin R. [4 ,5 ]
Leibo, Joel Z. [6 ,7 ]
机构
[1] Kings Coll London, Fac Life Sci & Med, Sch Life Course & Populat Sci, Populat Hlth Sci, London, England
[2] Univ Cyprus, Med Sch, Nicosia, Cyprus
[3] UCL, Div Psychiat, London, England
[4] Royal Brompton Hosp, London, England
[5] Kings Coll London, Fac Life Sci & Med, Sch Cardiovasc Med & Sci, London, England
[6] Google DeepMind, London, England
[7] Kings Coll London, Fac Nat Math & Engn Sci, Dept Informat, London, England
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
WATER GAS-EXCHANGE; CARBON-DIOXIDE; MASS-TRANSFER; BOREAL ZONE; BROWN-WATER; RAIN; SURFACE; WIND; DISSIPATION; TURBULENCE;
D O I
10.1371/journal.pone.0300024
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] A General Framework for Naming Qualitative Models Based on Intervals
    Martinez-Martin, Ester
    Teresa Escrig, M.
    del Pobil, Angel P.
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 681 - 688
  • [32] Trend Analysis Through Large Language Models
    Alzapiedi, Lucas
    Bihl, Trevor
    IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE, NAECON 2024, 2024, : 370 - 374
  • [33] Automated Topic Analysis with Large Language Models
    Kirilenko, Andrei
    Stepchenkova, Svetlana
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2024, ENTER 2024, 2024, : 29 - 34
  • [34] Multimodal large language models for bioimage analysis
    Zhang, Shanghang
    Dai, Gaole
    Huang, Tiejun
    Chen, Jianxu
    NATURE METHODS, 2024, 21 (08) : 1390 - 1393
  • [35] Large language models driven BIM-based DfMA method for free-form prefabricated buildings: framework and a usefulness case study
    Han, Dongchen
    Zhao, Wuji
    Yin, Hongxi
    Qu, Ming
    Zhu, Jian
    Ma, Feifan
    Ying, Yuejia
    Pan, Annika
    JOURNAL OF ASIAN ARCHITECTURE AND BUILDING ENGINEERING, 2024,
  • [36] Integrating large language models in mental health practice: a qualitative descriptive study based on expert interviews
    Ma, Yingzhuo
    Zeng, Yi
    Liu, Tong
    Sun, Ruoshan
    Xiao, Mingzhao
    Wang, Jun
    FRONTIERS IN PUBLIC HEALTH, 2024, 12
  • [37] A review of mixed-integer linear formulations for framework-based energy system models
    Hoffmann, Maximilian
    Schyska, Bruno U.
    Bartels, Julian
    Pelser, Tristan
    Behrens, Johannes
    Wetzel, Manuel
    Gils, Hans Christian
    Tang, Chuen-Fung
    Tillmanns, Marius
    Stock, Jan
    Xhonneux, Andre
    Kotzur, Leander
    Praktiknjo, Aaron
    Vogt, Thomas
    Jochem, Patrick
    Linssen, Jochen
    Weinand, Jann M.
    Stolten, Detlef
    ADVANCES IN APPLIED ENERGY, 2024, 16
  • [38] Algorithmic Ghost in the Research Shell: Large Language Models and Academic Knowledge Creation in Management Research
    Williams, Nigel
    Ivanov, Stanislav
    Buhalis, Dimitrios
    arXiv, 2023,
  • [39] Identifying contextual effective factors on total fertility rate decline in Iran: a qualitative framework-based study
    Jafari H.
    Pourreza A.
    Sadeghi A.
    Alizadeh G.
    Khodayari-Zarnaq R.
    Quality & Quantity, 2022, 56 (5) : 3395 - 3412
  • [40] A framework for neurosymbolic robot action planning using large language models
    Capitanelli, Alessio
    Mastrogiovanni, Fulvio
    FRONTIERS IN NEUROROBOTICS, 2024, 18