Comprehensive testing of large language models for extraction of structured data in pathology

被引:0
|
作者
Bastian Grothey [1 ]
Jan Odenkirchen [2 ]
Adnan Brkic [1 ]
Birgid Schömig-Markiefka [1 ]
Alexander Quaas [1 ]
Reinhard Büttner [1 ]
Yuri Tolkach [1 ]
机构
[1] University Hospital Cologne,Institute of Pathology
[2] University of Cologne,Medical Faculty
来源
关键词
D O I
10.1038/s43856-025-00808-8
中图分类号
学科分类号
摘要
Pathology departments produce many diagnostic reports as free text, which is hard to analyze or use in research and computer projects. Converting this free text into more standard organized information like test results or diagnoses, makes it easier to use. This task often requires human experts and takes time. Large language models (LLMs), which are advanced computer systems designed to understand and generate human-like text, might simplify this process. Here, we tested six LLMs, including freely available models and the commercial GPT-4 model, using 579 pathology reports in English and German. Our results show that freely available models can perform as well as commercial, providing a cheaper solution while avoiding privacy concerns. The shared dataset will support future research in pathology data processing.
引用
下载
收藏
相关论文
共 50 条
  • [1] Large Language Models as a Rapid and Objective Tool for Pathology Report Data Extraction
    Bolat, Beyza
    Eren, Ozgur Can
    Karasayar, A. Humeyra Dur
    Mericoz, Cisel Aydin
    Gunduz-Demir, Cigdem
    Kulac, Ibrahim
    TURKISH JOURNAL OF PATHOLOGY, 2024, 40 (02) : 138 - 141
  • [2] Structured information extraction from scientific text with large language models
    John Dagdelen
    Alexander Dunn
    Sanghoon Lee
    Nicholas Walker
    Andrew S. Rosen
    Gerbrand Ceder
    Kristin A. Persson
    Anubhav Jain
    Nature Communications, 15
  • [3] Structured information extraction from scientific text with large language models
    Dagdelen, John
    Dunn, Alexander
    Lee, Sanghoon
    Walker, Nicholas
    Rosen, Andrew S.
    Ceder, Gerbrand
    Persson, Kristin A.
    Jain, Anubhav
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [4] Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing
    Geevarghese, Ruben
    Sigel, Carlie
    Cadley, John
    Chatterjee, Subrata
    Jain, Pulkit
    Hollingsworth, Alex
    Chatterjee, Avijit
    Swinburne, Nathaniel
    Bilal, Khawaja Hasan
    Marinelli, Brett
    JOURNAL OF CLINICAL PATHOLOGY, 2024,
  • [5] Structured Pruning of Large Language Models
    Wang, Ziheng
    Wohlwend, Jeremy
    Lei, Tao
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6151 - 6162
  • [6] Applications of Large Language Models in Pathology
    Cheng, Jerome
    BIOENGINEERING-BASEL, 2024, 11 (04):
  • [7] Comparative Analysis of Large Language Models in Structured Information Extraction from Job Postings
    Sioziou, Kyriaki
    Zervas, Panagiotis
    Giotopoulos, Kostas
    Tzimas, Giannis
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2024, 2024, 2141 : 82 - 92
  • [8] Large Language Models for Data Extraction in Slot-Filling Tasks
    Bazan, Marek
    Gniazdowski, Tomasz
    Wolkiewicz, Dawid
    Sarna, Juliusz
    Marchwiany, Maciej E.
    SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 1 - 18
  • [9] Performance of two large language models for data extraction in evidence synthesis
    Konet, Amanda
    Thomas, Ian
    Gartlehner, Gerald
    Kahwati, Leila
    Hilscher, Rainer
    Kugley, Shannon
    Crotty, Karen
    Viswanathan, Meera
    Chew, Robert
    RESEARCH SYNTHESIS METHODS, 2024,
  • [10] Data extraction from polymer literature using large language models
    Gupta, Sonakshi
    Mahmood, Akhlak
    Shetty, Pranav
    Adeboye, Aishat
    Ramprasad, Rampi
    Communications Materials, 2024, 5 (01)