Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

被引：1

作者：

Amirova, Aliya ^{[1
]}

Fteropoulli, Theodora ^{[2
]}

Ahmed, Nafiso ^{[3
]}

Cowie, Martin R. ^{[4
,5
]}

Leibo, Joel Z. ^{[6
,7
]}

机构：

[1] Kings Coll London, Fac Life Sci & Med, Sch Life Course & Populat Sci, Populat Hlth Sci, London, England

[2] Univ Cyprus, Med Sch, Nicosia, Cyprus

[3] UCL, Div Psychiat, London, England

[4] Royal Brompton Hosp, London, England

[5] Kings Coll London, Fac Life Sci & Med, Sch Cardiovasc Med & Sci, London, England

[6] Google DeepMind, London, England

[7] Kings Coll London, Fac Nat Math & Engn Sci, Dept Informat, London, England

来源：

PLOS ONE | 2024年 / 19卷 / 03期

关键词：

WATER GAS-EXCHANGE; CARBON-DIOXIDE; MASS-TRANSFER; BOREAL ZONE; BROWN-WATER; RAIN; SURFACE; WIND; DISSIPATION; TURBULENCE;

D O I：

10.1371/journal.pone.0300024

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.

引用

页数：33

共 50 条

[41] Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
Zhang, Weigang
Zhou, Biyu
Wu, Xing
Gao, Chaochen
Liu, Zhibing
Tang, Xuehai
Li, Ruixuan
Han, Jizhong
Hu, Songlin
EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 424 - 438
[42] A Framework for Enhancing Statute Law Retrieval Using Large Language Models
Pham, Trang Ngoc Anh
Do, Dinh-Truong
Nguyen, Minh Le
NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 247 - 259
[43] UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models
Liu, Qi
He, Yongyi
Xu, Tong
Lian, Defu
Liu, Che
Zheng, Zhi
Chen, Enhong
International Conference on Information and Knowledge Management, Proceedings, : 1909 - 1919
[44] FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models
Zhang, Huaiwen
Chen, Yu
Wang, Ming
Feng, Shi
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 96 - 107
[45] A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework
Hamid, Rida
Brohi, Sarfraz
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
[46] Evaluating Large Language Models for Enhanced Fuzzing: An Analysis Framework for LLM-Driven Seed Generation
Black, Gavin
Vaidyan, Varghese Mathew
Comert, Gurcan
IEEE ACCESS, 2024, 12 : 156065 - 156081
[47] Exploring the Performance of Large Language Models for Data Analysis Tasks Through the CRISP-DM Framework
Musazade, Nurlan
Mezei, Jozsef
Wang, Xiaolu
GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024, 2024, 989 : 56 - 65
[48] Leveraging large language models for tourism research based on 5D framework: A collaborative analysis of tourist sentiments and spatial features
Rui, Jin
Xu, Yuhan
Cai, Chenfan
Li, Xiang
TOURISM MANAGEMENT, 2025, 108
[49] Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests
Hellas, Arto
Leinonen, Juho
Sarsa, Sami
Koutcheme, Charles
Kujanpaa, Lilja
Sorva, Juha
PROCEEDINGS OF THE 2023 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH V.1, ICER 2023 V1, 2023, : 93 - 105
[50] Evaluating the Accuracy of Responses by Large Language Models for Information on Disease Epidemiology
Zhu, Kexin
Zhang, Jiajie
Klishin, Anton
Esser, Mario
Blumentals, William A.
Juhaeri, Juhaeri
Jouquelet-Royer, Corinne
Sinnott, Sarah-Jo
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2025, 34 (02)

← 1 2 3 4 5 →