Comparing Generative AI Literature Reviews Versus Human-Led Systematic Literature Reviews: A Case Study on Big Data Research

被引:0
|
作者
Tosi, Davide [1 ]
机构
[1] Univ Insubria, Dept Theoret & Appl Sci, I-20110 Varese, Italy
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Big Data; Artificial intelligence; Real-time systems; Accuracy; Manuals; Generative AI; Finance; Scalability; AI-assisted research; big data; generative AI; large language models; systematic literature review; MANAGEMENT;
D O I
10.1109/ACCESS.2025.3554504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are transforming research methodologies, including Systematic Literature Reviews (SLRs). While traditional, human-led SLRs are labor-intensive, AI-driven approaches promise efficiency and scalability. However, the reliability and accuracy of AI-generated literature reviews remain uncertain. This study investigates the performance of GPT-4-powered Consensus in conducting an SLR on Big Data research, comparing its results with a manually conducted SLR. To evaluate Consensus, we analyzed its ability to detect relevant studies, extract key insights, and synthesize findings. Our human-led SLR identified 32 primary studies (PSs) and 207 related works, whereas Consensus detected 22 PSs, with 16 overlapping with the manual selection and 5 false positives. The AI-selected studies had an average citation count of 202 per study, significantly higher than the 64.4 citations per study in the manual SLR, indicating a possible bias toward highly cited papers. However, none of the 32 PSs selected manually were included in the AI-generated results, highlighting recall and selection accuracy limitations. Key findings reveal that Consensus accelerates literature retrieval but suffers from hallucinations, reference inaccuracies, and limited critical analysis. Specifically, it failed to capture nuanced research challenges and missed important application domains. Precision, recall, and F1 scores of the AI-selected studies were 76.2%, 38.1%, and 50.6%, respectively, demonstrating that while AI retrieves relevant papers with high precision, it lacks comprehensiveness. To mitigate these limitations, we propose a hybrid AI-human SLR framework, where AI enhances search efficiency while human reviewers ensure rigor and validity. While AI can support literature reviews, human oversight remains essential for ensuring accuracy and depth. Future research should assess AI-assisted SLRs across multiple disciplines to validate generalizability and explore domain-specific LLMs for improved performance.
引用
收藏
页码:56210 / 56219
页数:10
相关论文
共 50 条
  • [1] Comparing Integrative and Systematic Literature Reviews
    Cho, Yonjoo
    HUMAN RESOURCE DEVELOPMENT REVIEW, 2022, 21 (02) : 147 - 151
  • [2] AI meets academia: transforming systematic literature reviews
    Tomczyk, Przemyslaw
    Brueggemann, Philipp
    Vrontis, Demetris
    EUROMED JOURNAL OF BUSINESS, 2024,
  • [3] Do Systematic Literature Reviews Outperform Informal Literature Reviews in the Software Engineering Domain? An Initial Case Study
    Niazi, Mahmood
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2015, 40 (03) : 845 - 855
  • [4] Do Systematic Literature Reviews Outperform Informal Literature Reviews in the Software Engineering Domain? An Initial Case Study
    Mahmood Niazi
    Arabian Journal for Science and Engineering, 2015, 40 : 845 - 855
  • [5] OPPORTUNITIES AND LIMITATIONS IN THE USE OF AI TO ASSIST WITH DATA EXTRACTION IN SYSTEMATIC LITERATURE REVIEWS
    Roussi, K.
    Rice, H.
    King, E.
    Martin, A.
    VALUE IN HEALTH, 2024, 27 (12) : S626 - S626
  • [6] Literature research: Aims and design of systematic reviews
    de Vet, HCW
    Verhagen, AP
    Logghe, I
    Ostelo, RWJG
    AUSTRALIAN JOURNAL OF PHYSIOTHERAPY, 2005, 51 (02): : 125 - 128
  • [7] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
    Gwon, Yong Nam
    Kim, Jae Heon
    Chung, Hyun Soo
    Jung, Eun Jee
    Chun, Joey
    Lee, Serin
    Shim, Sung Ryul
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [8] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
    Gwon, Yong Nam
    Kim, Jae Heon
    Chung, Hyun Soo
    Jung, Eun Jee
    Chun, Joey
    Lee, Serin
    Shim, Sung Ryul
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [9] Systematic Reviews: When the Published Literature Is the Data
    Lyles, Alan
    CLINICAL THERAPEUTICS, 2010, 32 (10) : 1754 - 1755
  • [10] Systematic Literature Reviews in Social Sciences and Humanities A Case Study
    Mangas-Vega, Almudena
    Dantas, Taisa
    Merchan Sanchez-Jara, Javier
    Gomez-Diaz, Raquel
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (01) : 1 - 17