Comparing Generative AI Literature Reviews Versus Human-Led Systematic Literature Reviews: A Case Study on Big Data Research

被引:0
|
作者
Tosi, Davide [1 ]
机构
[1] Univ Insubria, Dept Theoret & Appl Sci, I-20110 Varese, Italy
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Big Data; Artificial intelligence; Real-time systems; Accuracy; Manuals; Generative AI; Finance; Scalability; AI-assisted research; big data; generative AI; large language models; systematic literature review; MANAGEMENT;
D O I
10.1109/ACCESS.2025.3554504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are transforming research methodologies, including Systematic Literature Reviews (SLRs). While traditional, human-led SLRs are labor-intensive, AI-driven approaches promise efficiency and scalability. However, the reliability and accuracy of AI-generated literature reviews remain uncertain. This study investigates the performance of GPT-4-powered Consensus in conducting an SLR on Big Data research, comparing its results with a manually conducted SLR. To evaluate Consensus, we analyzed its ability to detect relevant studies, extract key insights, and synthesize findings. Our human-led SLR identified 32 primary studies (PSs) and 207 related works, whereas Consensus detected 22 PSs, with 16 overlapping with the manual selection and 5 false positives. The AI-selected studies had an average citation count of 202 per study, significantly higher than the 64.4 citations per study in the manual SLR, indicating a possible bias toward highly cited papers. However, none of the 32 PSs selected manually were included in the AI-generated results, highlighting recall and selection accuracy limitations. Key findings reveal that Consensus accelerates literature retrieval but suffers from hallucinations, reference inaccuracies, and limited critical analysis. Specifically, it failed to capture nuanced research challenges and missed important application domains. Precision, recall, and F1 scores of the AI-selected studies were 76.2%, 38.1%, and 50.6%, respectively, demonstrating that while AI retrieves relevant papers with high precision, it lacks comprehensiveness. To mitigate these limitations, we propose a hybrid AI-human SLR framework, where AI enhances search efficiency while human reviewers ensure rigor and validity. While AI can support literature reviews, human oversight remains essential for ensuring accuracy and depth. Future research should assess AI-assisted SLRs across multiple disciplines to validate generalizability and explore domain-specific LLMs for improved performance.
引用
收藏
页码:56210 / 56219
页数:10
相关论文
共 50 条
  • [41] Psychometric testing of SPIDER: Data capture tool for systematic literature reviews
    Classen, Sherrilene
    Winter, Sandra
    Awadzi, Kezia D.
    Garvan, Cynthia W.
    Lopez, Ellen D. S.
    Sundaram, Swathy
    AMERICAN JOURNAL OF OCCUPATIONAL THERAPY, 2008, 62 (03): : 335 - 342
  • [42] Perinatal unusual rhabdomyoma location - case report and systematic reviews of the literature
    de Melo Bezerra Cavalcante, Candice Torres
    Pinto Junior, Valdester Cavalcante
    Pompeu, Ronald Guedes
    de Oliveira Teles, Andrea Consuelo
    Bandeira, Jeanne Araujo
    Leite Maia, Isabel Cristina
    Fernandes Tavora, Fabio Rocha
    Cavalcante, Marcelo Borges
    Perez Zamarian, Ana Cristina
    Araujo Junior, Edward
    Castello Branco, Klebia Magalhaes
    JOURNAL OF MATERNAL-FETAL & NEONATAL MEDICINE, 2021, 34 (01): : 137 - 151
  • [43] New Section Editors for Systematic Literature Reviews and Case Reports for the Laryngoscope
    Selesnick, Samuel H. H.
    LARYNGOSCOPE, 2023, 133 (05): : 1001 - 1001
  • [44] AN APPROACH TO COMPARING NATIONS FOR INCLUSION OF STUDIES IN HEALTH-BASED SYSTEMATIC LITERATURE REVIEWS
    Deonandan, R.
    Schachter, H.
    Ly, M.
    Girardi, A.
    Lacroix, D.
    Barrowman, N.
    Moore, C.
    Abdulkadir, I.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 173 : S29 - S29
  • [45] A case study of misrepresentation of the scientific literature: Recent reviews of chiropractic
    Morley, J
    Rosner, AL
    Redwood, D
    JOURNAL OF ALTERNATIVE AND COMPLEMENTARY MEDICINE, 2001, 7 (01) : 65 - 78
  • [46] A systematic mapping on the use of visual data mining to support the conduct of systematic literature reviews
    Felizardo, Katia R.
    Macdonell, Stephen G.
    Mendes, Emília
    Maldonado, José Carlos
    Journal of Software, 2012, 7 (02) : 450 - 461
  • [47] Evaluating Literature Reviews Conducted by Humans Versus ChatGPT: Comparative Study
    Mostafapour, Mehrnaz
    Fortier, Jacqueline H.
    Pacheco, Karen
    Murray, Heather
    Garber, Gary
    JMIR AI, 2024, 3
  • [48] Assessing Quality in Systematic Literature Reviews: A Study of Novice Rater Training
    Acosta, Sandra
    Garza, Tiberio
    Hsu, Hsien-Yuan
    Goodson, Patricia
    SAGE OPEN, 2020, 10 (03):
  • [49] Iterative guided machine learning-assisted systematic literature reviews: a diabetes case study
    John Zimmerman
    Robin E. Soler
    James Lavinder
    Sarah Murphy
    Charisma Atkins
    LaShonda Hulbert
    Richard Lusk
    Boon Peng Ng
    Systematic Reviews, 10
  • [50] Towards a Semi-Automated Approach for Systematic Literature Reviews Completed Research
    Denzler, Tim
    Enders, Martin Robert
    Akello, Patricia
    DIGITAL INNOVATION AND ENTREPRENEURSHIP (AMCIS 2021), 2021,