Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing

被引：84

作者：

Garg, Ravi ^{[1
]}

Oh, Elissa ^{[1
]}

Naidech, Andrew ^{[1
]}

Kording, Konrad ^{[2
]}

Prabhakaran, Shyam ^{[3
]}

机构：

[1] Northwestern Univ, Feinberg Sch Med, Dept Neurol, 633 St Clair St 2041, Chicago, IL 60611 USA

[2] Univ Penn, Philadelphia, PA 19104 USA

[3] Univ Chicago, Pritzker Sch Med, Dept Neurol, Chicago, IL 60611 USA

来源：

JOURNAL OF STROKE & CEREBROVASCULAR DISEASES | 2019年 / 28卷 / 07期

关键词：

Ischemic stroke; cryptogenic; cardioembolism; natural language processing; machine learning; ETIOLOGIC CLASSIFICATION; CAUSATIVE CLASSIFICATION; TOAST; MECHANISM; CCS;

D O I：

10.1016/j.jstrokecerebrovasdis.2019.02.004

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

Objective: The manual adjudication of disease classification is time-consuming, error-prone, and limits scaling to large datasets. In ischemic stroke (IS), subtype classification is critical for management and outcome prediction. This study sought to use natural language processing of electronic health records (EHR) combined with machine learning methods to automate IS subtyping. Methods: Among IS patients from an observational registry with TOAST subtyping adjudicated by board-certified vascular neurologists, we analyzed unstructured text-based EHR data including neurology progress notes and neuroradiology reports using natural language processing. We performed several feature selection methods to reduce the high dimensionality of the features and 5-fold cross validation to test generalizability of our methods and minimize overfitting. We used several machine learning methods and calculated the kappa values for agreement between each machine learning approach to manual adjudication. We then performed a blinded testing of the best algorithm against a held-out subset of 50 cases. Results: Compared to manual classification, the best machine-based classification achieved a kappa of .25 using radiology reports alone, .57 using progress notes alone, and .57 using combined data. Kappa values varied by subtype being highest for cardioembolic (.64) and lowest for cryptogenic cases (.47). In the held-out test subset, machine-based classification agreed with rater classification in 40 of 50 cases (kappa .72). Conclusions: Automated machine learning approaches using textual data from the EHR shows agreement with manual TOAST classification. The automated pipeline, if externally validated, could enable large-scale stroke epidemiology research.

引用

页码：2045 / 2051

页数：7

共 50 条

[41] Special Issue on Machine Learning and Natural Language Processing
Mozgovoy, Maxim
Montero, Calkin Suero
APPLIED SCIENCES-BASEL, 2022, 12 (17):
[42] Machine learning for natural language processing (and vice versa?)
Cardie, C
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 2 - 2
[43] Quantum machine learning for natural language processing application
Pandey, Shyambabu
Basisth, Nihar Jyoti
Sachan, Tushar
Kumari, Neha
Pakray, Partha
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 627
[44] Tutorial: Machine learning methods in natural language processing
Collins, M
LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 655 - 655
[45] Machine learning for efficient natural-language processing
Pereira, F
COMBINATORIAL PATTERN MATCHING, 2000, 1848 : 11 - 11
[46] Machine learning for natural language processing (and vice versa?)
Cardie, C
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 2 - 2
[47] Machine translation using natural language processing
Rishita, Middi Venkata Sai
Raju, Middi Appala
Harris, Tanvir Ahmed
2018 INTERNATIONAL JOINT CONFERENCE ON METALLURGICAL AND MATERIALS ENGINEERING (JCMME 2018), 2019, 277
[48] Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI
Heo, Tak Sung
Kim, Yu Seop
Choi, Jeong Myeong
Jeong, Yeong Seok
Seo, Soo Young
Lee, Jun Ho
Jeon, Jin Pyeong
Kim, Chulho
JOURNAL OF PERSONALIZED MEDICINE, 2020, 10 (04): : 1 - 11
[49] Identifying Characteristics of Patients With Suspected Stroke by Paramedics but not by Emergency Medical Dispatchers Using Natural Language Processing and Machine Learning
Richards, Christopher T.
Garg, Ravi P.
Mendelson, Scott J.
Stein-Spencer, Leslee
Prabhakaran, Shyam
STROKE, 2018, 49
[50] Nursing innovations in machine learning: Using Natural Language Processing in Falls Prediction
Solberg, L. M.
Ingibjargardottir, R.
Wu, Y.
Lucero, R.
JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2020, 68 : S48 - S49

← 1 2 3 4 5 →