A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

被引:167
|
作者
Jiang, Min [1 ]
Chen, Yukun [1 ]
Liu, Mei [1 ]
Rosenbloom, S. Trent [1 ,2 ]
Mani, Subramani [1 ]
Denny, Joshua C. [1 ,2 ]
Xu, Hua [1 ]
机构
[1] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Sch Med, Dept Med, Nashville, TN 37232 USA
关键词
MEDICATION INFORMATION; SYSTEM; RECOGNITION; TEXTS; NAMES;
D O I
10.1136/amiajnl-2011-000163
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities including medical problems, tests, and treatments, as well as their asserted status from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. Design The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Measurements Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Results and discussion Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
引用
收藏
页码:601 / 606
页数:6
相关论文
共 50 条
  • [1] Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
    Wang, Jianhong
    Peng, Yousong
    Liu, Bin
    Wu, Zhiqiang
    Deng, Lizong
    Jiang, Taijiao
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING, MANUFACTURING TECHNOLOGY AND CONTROL, 2016, 67 : 1503 - 1508
  • [2] A Study of Machine Learning Based Approaches to Extract Personality Information from Curriculum Vitae
    Dickmond, Leong
    Hameed, Vazeerudeen Abdul
    Rana, Muhammad Ehsan
    2021 14TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE), 2021, : 82 - 85
  • [3] Oxidation Stability of Hydrocarbons: A Machine-Learning-Based Study
    Venegas-Reynoso, Adrian
    Creton, Benoit
    Giarracca-Mehl, Lucia
    Lacoue-Negre, Marion
    Ruckebusch, Cyril
    Duponchel, Ludovic
    ENERGY & FUELS, 2025, 39 (09) : 4361 - 4373
  • [4] Combining Machine Learning with a Rule-Based Algorithm to Detect and Identify Related Entities of Documented Adverse Drug Reactions on Hospital Discharge Summaries
    Tan, Hui Xing
    Teo, Chun Hwee Desmond
    Ang, Pei San
    Loke, Wei Ping Celine
    Tham, Mun Yee
    Tan, Siew Har
    Soh, Bee Leng Sally
    Foo, Pei Qin Belinda
    Ling, Zheng Jye
    Yip, Wei Luen James
    Tang, Yixuan
    Yang, Jisong
    Tung, Kum Hoe Anthony
    Dorajoo, Sreemanee Raaj
    DRUG SAFETY, 2022, 45 (08) : 853 - 862
  • [5] Combining Machine Learning with a Rule-Based Algorithm to Detect and Identify Related Entities of Documented Adverse Drug Reactions on Hospital Discharge Summaries
    Hui Xing Tan
    Chun Hwee Desmond Teo
    Pei San Ang
    Wei Ping Celine Loke
    Mun Yee Tham
    Siew Har Tan
    Bee Leng Sally Soh
    Pei Qin Belinda Foo
    Zheng Jye Ling
    Wei Luen James Yip
    Yixuan Tang
    Jisong Yang
    Kum Hoe Anthony Tung
    Sreemanee Raaj Dorajoo
    Drug Safety, 2022, 45 : 853 - 862
  • [6] Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries
    Xu, Yan
    Hong, Kai
    Tsujii, Junichi
    Chang, Eric I-Chao
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) : 824 - 832
  • [7] Machine-Learning-Based Classification Approaches toward Recognizing Slope Stability Failure
    Moayedi, Hossein
    Dieu Tien Bui
    Kalantar, Bahareh
    Foong, Loke Kok
    APPLIED SCIENCES-BASEL, 2019, 9 (21):
  • [8] Comparative Study of Machine-Learning-Based Methods for Log Prediction
    Simoes, Vanessa
    Maniar, Hiren
    Abubakar, Aria
    Zhao, Tao
    PETROPHYSICS, 2023, 64 (02): : 192 - 212
  • [9] Dashboarding to Monitor Machine-Learning-Based Clinical Decision Support Interventions
    Hekman, Daniel J.
    Barton, Hanna J.
    Maru, Apoorva P.
    Wills, Graham
    Cochran, Amy L.
    Fritsch, Corey
    Wiegmann, Douglas A.
    Liao, Frank
    Patterson, Brian W.
    APPLIED CLINICAL INFORMATICS, 2024, 15 (01): : 164 - 169
  • [10] Machine-Learning-Based Approaches for Multi-Level Sentiment Analysis of Romanian Reviews
    Briciu, Anamaria
    Calin, Alina-Delia
    Miholca, Diana-Lucia
    Moroz-Dubenco, Cristiana
    Petrascu, Vladiela
    Dascalu, George
    MATHEMATICS, 2024, 12 (03)