Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing

被引:84
|
作者
Garg, Ravi [1 ]
Oh, Elissa [1 ]
Naidech, Andrew [1 ]
Kording, Konrad [2 ]
Prabhakaran, Shyam [3 ]
机构
[1] Northwestern Univ, Feinberg Sch Med, Dept Neurol, 633 St Clair St 2041, Chicago, IL 60611 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Chicago, Pritzker Sch Med, Dept Neurol, Chicago, IL 60611 USA
来源
关键词
Ischemic stroke; cryptogenic; cardioembolism; natural language processing; machine learning; ETIOLOGIC CLASSIFICATION; CAUSATIVE CLASSIFICATION; TOAST; MECHANISM; CCS;
D O I
10.1016/j.jstrokecerebrovasdis.2019.02.004
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Objective: The manual adjudication of disease classification is time-consuming, error-prone, and limits scaling to large datasets. In ischemic stroke (IS), subtype classification is critical for management and outcome prediction. This study sought to use natural language processing of electronic health records (EHR) combined with machine learning methods to automate IS subtyping. Methods: Among IS patients from an observational registry with TOAST subtyping adjudicated by board-certified vascular neurologists, we analyzed unstructured text-based EHR data including neurology progress notes and neuroradiology reports using natural language processing. We performed several feature selection methods to reduce the high dimensionality of the features and 5-fold cross validation to test generalizability of our methods and minimize overfitting. We used several machine learning methods and calculated the kappa values for agreement between each machine learning approach to manual adjudication. We then performed a blinded testing of the best algorithm against a held-out subset of 50 cases. Results: Compared to manual classification, the best machine-based classification achieved a kappa of .25 using radiology reports alone, .57 using progress notes alone, and .57 using combined data. Kappa values varied by subtype being highest for cardioembolic (.64) and lowest for cryptogenic cases (.47). In the held-out test subset, machine-based classification agreed with rater classification in 40 of 50 cases (kappa .72). Conclusions: Automated machine learning approaches using textual data from the EHR shows agreement with manual TOAST classification. The automated pipeline, if externally validated, could enable large-scale stroke epidemiology research.
引用
收藏
页码:2045 / 2051
页数:7
相关论文
共 50 条
  • [21] Automating the Translation of Assertions Using Natural Language Processing Techniques
    Soeken, Mathias
    Harris, Christopher B.
    Abdessaied, Nabila
    Harris, Ian G.
    Drechsler, Rolf
    PROCEEDINGS OF THE 2014 FORUM ON SPECIFICATION & DESIGN LANGUAGES (FDL), 2014,
  • [22] Machine learning in statistical natural language processing
    Mochihashi, Daichi
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2015, 69 (02): : 131 - 135
  • [23] Knowledgeable Machine Learning for Natural Language Processing
    Han, Xu
    Zhang, Zhengyan
    Liu, Zhiyuan
    COMMUNICATIONS OF THE ACM, 2021, 64 (11) : 50 - 51
  • [24] Detecting Phishing Attacks Using Natural Language Processing and Machine Learning
    Peng, Tianrui
    Harris, Ian G.
    Sawa, Yuki
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 300 - 301
  • [25] Analysis of Breakdown Reports Using Natural Language Processing and Machine Learning
    Ahmed, Mobyen Uddin
    Bengtsson, Marcus
    Salonen, Antti
    Funk, Peter
    INTERNATIONAL CONGRESS AND WORKSHOP ON INDUSTRIAL AI 2021, 2022, : 40 - 52
  • [26] CATEGORIZING TELEMEDICINE VISITS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Sudaria, T.
    Overcash, J.
    Nguyen, N.
    Oguntuga, A.
    VALUE IN HEALTH, 2022, 25 (07) : S597 - S597
  • [27] Application of machine learning and natural language processing for predicting stroke-associated pneumonia
    Tsai, Hui-Chu
    Hsieh, Cheng-Yang
    Sung, Sheng-Feng
    FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [28] Detecting Phishing Attacks Using Natural Language Processing And Machine Learning
    Banu, Reshma
    Anand, M.
    Kamath, Akshatha C.
    Ashika, S.
    Ujwala, H. S.
    Harshitha, S. N.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1210 - 1214
  • [29] Subjective Answers Evaluation Using Machine Learning and Natural Language Processing
    Bashir, Muhammad Farrukh
    Arshad, Hamza
    Javed, Abdul Rehman
    Kryvinska, Natalia
    Band, Shahab S.
    IEEE ACCESS, 2021, 9 : 158972 - 158983
  • [30] Towards High-Precision Stroke Classification Using Natural Language Processing
    Majersik, Jennifer J.
    Mowery, Danielle
    Zhang, Mingyuan
    Hill, Brent
    Cannon-Albright, Lisa A.
    Chapman, Wendy
    STROKE, 2018, 49