Information extraction from weakly structured radiological reports with natural language queries

被引:4
|
作者
Dada, Amin [1 ]
Ufer, Tim Leon [1 ]
Kim, Moon [1 ]
Hasin, Max [1 ]
Spieker, Nicola [2 ]
Forsting, Michael [1 ,3 ]
Nensa, Felix [1 ,3 ]
Egger, Jan [1 ,4 ]
Kleesiek, Jens [1 ,2 ,5 ]
机构
[1] Univ Hosp Essen, Inst AI Med IKIM, Girardetstr 2, D-45131 Essen, Germany
[2] Dr Kruger MVZ GmbH, Bocholt, Germany
[3] Univ Hosp Essen, Inst Diagnost & Intervent Radiol & Neuroradiol, Essen, Germany
[4] Univ Med Essen, Canc Res Ctr Cologne Essen CCCE, Essen, Germany
[5] German Canc Consortium DKTK, Partner Site Essen, Essen, Germany
关键词
Information extraction; Natural language processing; Machine learning;
D O I
10.1007/s00330-023-09977-3
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectivesProvide physicians and researchers an efficient way to extract information from weakly structured radiology reports with natural language processing (NLP) machine learning models.MethodsWe evaluate seven different German bidirectional encoder representations from transformers (BERT) models on a dataset of 857,783 unlabeled radiology reports and an annotated reading comprehension dataset in the format of SQuAD 2.0 based on 1223 additional reports.ResultsContinued pre-training of a BERT model on the radiology dataset and a medical online encyclopedia resulted in the most accurate model with an F1-score of 83.97% and an exact match score of 71.63% for answerable questions and 96.01% accuracy in detecting unanswerable questions. Fine-tuning a non-medical model without further pre-training led to the lowest-performing model. The final model proved stable against variation in the formulations of questions and in dealing with questions on topics excluded from the training set.ConclusionsGeneral domain BERT models further pre-trained on radiological data achieve high accuracy in answering questions on radiology reports. We propose to integrate our approach into the workflow of medical practitioners and researchers to extract information from radiology reports.Clinical relevance statementBy reducing the need for manual searches of radiology reports, radiologists' resources are freed up, which indirectly benefits patients.Key Points center dot BERT models pre-trained on general domain datasets and radiology reports achieve high accuracy (83.97% F1-score) on question-answering for radiology reports.center dot The best performing model achieves an F1-score of 83.97% for answerable questions and 96.01% accuracy for questions without an answer.center dot Additional radiology-specific pretraining of all investigated BERT models improves their performance.Key Points center dot BERT models pre-trained on general domain datasets and radiology reports achieve high accuracy (83.97% F1-score) on question-answering for radiology reports.center dot The best performing model achieves an F1-score of 83.97% for answerable questions and 96.01% accuracy for questions without an answer.center dot Additional radiology-specific pretraining of all investigated BERT models improves their performance.Key Points center dot BERT models pre-trained on general domain datasets and radiology reports achieve high accuracy (83.97% F1-score) on question-answering for radiology reports.center dot The best performing model achieves an F1-score of 83.97% for answerable questions and 96.01% accuracy for questions without an answer.center dot Additional radiology-specific pretraining of all investigated BERT models improves their performance.
引用
收藏
页码:330 / 337
页数:8
相关论文
共 50 条
  • [21] Structured queries, language modeling, and relevance modeling in cross-language information retrieval
    Larkey, LS
    Connell, ME
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 457 - 473
  • [22] A natural language query interface to structured information
    Tablan, Valentin
    Damljanovic, Danica
    Bontcheva, Kalina
    [J]. SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2008, 5021 : 361 - 375
  • [23] Information Extraction from Concise Passages of Natural Language Sources
    Pohorec, Sandi
    Verlic, Mateja
    Zorman, Milan
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 6295 : 463 - 474
  • [24] RADIOLOGICAL REPORTS: A COMPARISON BETWEEN THE TRANSMISSION EFFICIENCY OF INFORMATION IN FREE TEXT AND IN STRUCTURED REPORTS
    Barbosa, Flavio
    Zanini Maciel, Lea Maria
    Vieira, Elizabeth Melmi
    de Azevedo Marques, Paulo M.
    Elias, Jorge, Jr.
    Muglia, Valdair Francisco
    [J]. CLINICS, 2010, 65 (01) : 15 - 21
  • [25] The ambiguity of negation in natural language queries to information retrieval systems
    McQuire, AR
    Eastman, CM
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1998, 49 (08): : 686 - 692
  • [26] Goal Detection from Natural Language Queries
    He, Yulan
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 157 - 168
  • [27] Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures
    Huang, Honghong
    Lim, Fiona Xin Yi
    Gu, Gary Tianyu
    Han, Matthew Jiangchou
    Fang, Andrew Hao Sen
    Chia, Elian Hui San
    Bei, Eileen Yen Tze
    Tham, Sarah Zhuling
    Ho, Henry Sun Sien
    Yuen, John Shyi Peng
    Sun, Aixin
    Lim, Jay Kheng Sit
    [J]. HELIYON, 2023, 9 (04)
  • [28] A Hybrid Approach for Spatial Information Extraction from Natural Language Text
    Hassini, Nesrine
    Mahmoudi, Khaoula
    Faiz, Sami
    [J]. 2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [29] Natural Language Processing Methods and Techniques for Knowledge Extraction from School Reports
    Venturi, Giulia
    Dell'Orletta, Felice
    Montemagni, Simonetta
    Morini, Elettra
    Sagri, Maria Teresa
    [J]. CADMO, 2020, (02): : 49 - +
  • [30] Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning
    Jackson M. Steinkamp
    Charles Chambers
    Darco Lalevic
    Hanna M. Zafar
    Tessa S. Cook
    [J]. Journal of Digital Imaging, 2019, 32 : 554 - 564