Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

被引:1
|
作者
Amirova, Aliya [1 ]
Fteropoulli, Theodora [2 ]
Ahmed, Nafiso [3 ]
Cowie, Martin R. [4 ,5 ]
Leibo, Joel Z. [6 ,7 ]
机构
[1] Kings Coll London, Fac Life Sci & Med, Sch Life Course & Populat Sci, Populat Hlth Sci, London, England
[2] Univ Cyprus, Med Sch, Nicosia, Cyprus
[3] UCL, Div Psychiat, London, England
[4] Royal Brompton Hosp, London, England
[5] Kings Coll London, Fac Life Sci & Med, Sch Cardiovasc Med & Sci, London, England
[6] Google DeepMind, London, England
[7] Kings Coll London, Fac Nat Math & Engn Sci, Dept Informat, London, England
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
WATER GAS-EXCHANGE; CARBON-DIOXIDE; MASS-TRANSFER; BOREAL ZONE; BROWN-WATER; RAIN; SURFACE; WIND; DISSIPATION; TURBULENCE;
D O I
10.1371/journal.pone.0300024
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Automating Qualitative Data Analysis with Large Language Models
    Parfenova, Angelina
    Denzler, Alexander
    Pfeffer, Juergen
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 101 - 109
  • [2] LUNA: A Model-Based Universal Analysis Framework for Large Language Models
    Song, Da
    Xie, Xuan
    Song, Jiayang
    Zhu, Derui
    Huang, Yuheng
    Felix, Juefei-Xu
    Ma, Lei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (07) : 1921 - 1948
  • [3] Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)
    Kuo, Mu-Tien
    Hsueh, Chih-Chung
    Tsai, Richard Tzong-Han
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23554 - 23555
  • [4] Can large language models estimate public opinion about global warming? An empirical assessment of algorithmic fidelity and bias
    Lee, Sanguk
    Peng, Tai-Quan
    Goldberg, Matthew H.
    Rosenthal, Seth A.
    Kotcher, John E.
    Maibach, Edward W.
    Leiserowitz, Anthony
    PLOS CLIMATE, 2024, 3 (08):
  • [5] An Intent-based Networks Framework based on Large Language Models
    Fuad, Ahlam
    Ahmed, Azza H.
    Riegler, Michael A.
    Cicic, Tarik
    2024 IEEE 10TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT 2024, 2024, : 7 - 12
  • [6] Using framework-based synthesis for conducting reviews of qualitative studies
    Dixon-Woods, Mary
    BMC MEDICINE, 2011, 9
  • [7] Using framework-based synthesis for conducting reviews of qualitative studies
    Mary Dixon-Woods
    BMC Medicine, 9
  • [8] Dynamics of Students' Career Choice: a Conceptual Framework-Based Qualitative Analysis Focusing on Primary Care
    Pfarrwaller, Eva
    Maisonneuve, Hubert
    Laurent, Camille
    Abbiati, Milena
    Sommer, Johanna
    Baroffio, Anne
    Haller, Dagmar M.
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2024, 39 (09) : 1544 - 1555
  • [9] Sentiment analysis of online responses in the performing arts with large language models
    Seong, Baekryun
    Song, Kyungwoo
    HELIYON, 2023, 9 (12)
  • [10] Sentiment and Emotion Analysis with Large Language Models for Political Security Prediction Framework
    Zaabar, Liyana Safra
    Yacob, Adriana Arul
    Isa, Mohd Rizal Mohd
    Wook, Muslihah
    Abdullah, Nor Asiakin
    Ramli, Suzaimah
    Razali, Noor Afiza Mat
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 954 - 960