A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

被引:167
|
作者
Jiang, Min [1 ]
Chen, Yukun [1 ]
Liu, Mei [1 ]
Rosenbloom, S. Trent [1 ,2 ]
Mani, Subramani [1 ]
Denny, Joshua C. [1 ,2 ]
Xu, Hua [1 ]
机构
[1] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Sch Med, Dept Med, Nashville, TN 37232 USA
关键词
MEDICATION INFORMATION; SYSTEM; RECOGNITION; TEXTS; NAMES;
D O I
10.1136/amiajnl-2011-000163
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities including medical problems, tests, and treatments, as well as their asserted status from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. Design The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Measurements Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Results and discussion Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
引用
收藏
页码:601 / 606
页数:6
相关论文
共 50 条
  • [21] Machine-Learning-Based Uplink Throughput Prediction from Physical Layer Measurements
    Eyceyurt, Engin
    Egi, Yunus
    Zec, Josko
    ELECTRONICS, 2022, 11 (08)
  • [22] Machine-learning-based porosity estimation from multifrequency poststack seismic data
    Jo, Honggeun
    Cho, Yongchae
    Pyrcz, Michael
    Tang, Hewei
    Fu, Pengcheng
    GEOPHYSICS, 2022, 87 (05) : M217 - M233
  • [23] Trajectory Prediction and Conflict Detection for Unmanned Traffic Management: a Performance Comparison of Machine-Learning-Based Approaches
    De Dominicis, Dario
    Conte, Claudia
    Mattei, Fausta
    Rufino, Giancarlo
    Accardo, Domenico
    2022 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (IEEE METROAEROSPACE 2022), 2022, : 633 - 638
  • [24] Preprocessing approaches in machine-learning-based groundwater potential mapping: an application to the Koulikoro and Bamako regions, Mali
    Gomez-Escalonilla, Victor
    Martinez-Santos, Pedro
    Martin-Loeches, Miguel
    HYDROLOGY AND EARTH SYSTEM SCIENCES, 2022, 26 (02) : 221 - 243
  • [25] Comparison and evaluation of machine-learning-based spatial downscaling approaches on satellite-derived precipitation data
    Zhu, Honglin
    Zhou, Qiming
    Cui, Aihong
    GEOSPATIAL WEEK 2023, VOL. 10-1, 2023, : 919 - 924
  • [26] Proactive machine-learning-based approaches to vaccine hesitancy for a potential SARS-Cov-2 vaccine
    Oreskovic, T.
    TIljak, M. Kujundzic
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : V17 - V17
  • [27] A machine-learning-based prediction method for easy COPD classification based on pulse oximetry clinical use
    Abineza, Claudia
    Balas, Valentina E.
    Nsengiyumva, Philibert
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 1683 - 1695
  • [28] Regression-based machine learning approaches for estimating discharge from water levels in microtidal rivers
    Mihel, Anna Maria
    Krvavica, Nino
    Lerga, Jonatan
    JOURNAL OF HYDROLOGY, 2025, 646
  • [29] Appraising the Risk Assessment of Non-Structural Components via Simplified and Machine-Learning-Based Approaches
    Shahnazaryan, Davit
    O'Reilly, Gerard J.
    JOURNAL OF EARTHQUAKE ENGINEERING, 2024, 28 (09) : 2440 - 2463
  • [30] Machine-Learning-Based Evaluation of Intratumoral Heterogeneity and Tumor-Stroma Interface for Clinical Guidance
    Laurinavicius, Arvydas
    Rasmusson, Allan
    Plancoulaine, Benoit
    Shribak, Michael
    Levenson, Richard
    AMERICAN JOURNAL OF PATHOLOGY, 2021, 191 (10): : 1724 - 1731