Machine learning prediction of incidence of Alzheimer's disease using large-scale administrative health data

被引:54
|
作者
Park, Ji Hwan [1 ]
Cho, Han Eol [2 ,3 ]
Kim, Jong Hun [4 ]
Wall, Melanie M. [5 ]
Stern, Yaakov [5 ,6 ]
Lim, Hyunsun [7 ]
Yoo, Shinjae [1 ]
Kim, Hyoung Seop [8 ]
Cha, Jiook [5 ,9 ,10 ,11 ]
机构
[1] Brookhaven Natl Lab, Computat Sci Initiat, Upton, NY 11973 USA
[2] Yonsei Univ, Gangnam Severance Hosp, Dept Rehabil Med, Coll Med, Seoul, South Korea
[3] Yonsei Univ, Coll Med, Rehabil Inst Neuromuscular Dis, Seoul, South Korea
[4] Ilsan Hosp, Dementia Ctr, Dept Neurol, Natl Hlth Insurance Serv, Goyang, South Korea
[5] Columbia Univ, Vagelos Coll Phys & Surg, Dept Psychiat, New York, NY 10025 USA
[6] Columbia Univ, Vagelos Coll Phys & Surg, Dept Neurol, New York, NY 10025 USA
[7] Ilsan Hosp, Natl Hlth Insurance Serv, Res & Anal Team, Goyang, South Korea
[8] Ilsan Hosp, Dementia Ctr, Dept Phys Med & Rehabil, Natl Hlth Insurance Serv, Goyang, South Korea
[9] Seoul Natl Univ, Dept Psychol, Seoul, South Korea
[10] Seoul Natl Univ, Dept Brain & Cognit Sci, Seoul, South Korea
[11] Seoul Natl Univ, Grad Sch Data Sci, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
DEMENTIA RISK; COGNITIVE DEFICITS; OLDER PERSONS; POPULATION; DYSFUNCTION; MODELS; ANEMIA; SAMPLE; COHORT;
D O I
10.1038/s41746-020-0256-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals' history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer's disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: "definite AD" with diagnostic codes and dementia medication (n = 614) and "probable AD" with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on "definite AD" and "probable AD" outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
引用
下载
收藏
页数:7
相关论文
共 50 条
  • [21] A Multi-modal Data Platform for Diagnosis and Prediction of Alzheimer’s Disease Using Machine Learning Methods
    Zhen Pang
    Xiang Wang
    Xulong Wang
    Jun Qi
    Zhong Zhao
    Yuan Gao
    Yun Yang
    Po Yang
    Mobile Networks and Applications, 2021, 26 : 2341 - 2352
  • [22] A Machine-Learning Approach for Communication Prediction of Large-Scale Applications
    Papadopoulou, Nikela
    Goumas, Georgios
    Koziris, Nectarios
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 120 - 123
  • [23] Altered large-scale dynamic connectivity patterns in Alzheimer's disease and mild cognitive impairment patients: A machine learning study
    Jing, Rixing
    Chen, Pindong
    Wei, Yongbin
    Si, Juanning
    Zhou, Yuying
    Wang, Dawei
    Song, Chengyuan
    Yang, Hongwei
    Zhang, Zengqiang
    Yao, Hongxiang
    Kang, Xiaopeng
    Fan, Lingzhong
    Han, Tong
    Qin, Wen
    Zhou, Bo
    Jiang, Tianzi
    Lu, Jie
    Han, Ying
    Zhang, Xi
    Liu, Bing
    Yu, Chunshui
    Wang, Pan
    Liu, Yong
    Alzheimers Dis Neuroimaging Initiat
    HUMAN BRAIN MAPPING, 2023, 44 (09) : 3467 - 3480
  • [24] Custom machine learning algorithm for large-scale disease screening - taking heart disease data as an example
    Chen, Leran
    Ji, Ping
    Ma, Yongsheng
    Rong, Yiming
    Ren, Jingzheng
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 146
  • [25] Large-scale gene expression in Alzheimer's disease lymphoblasts
    Scherzer, CR
    Lah, J
    Bennett-Desmelik, JA
    Fang, G
    Counts, S
    Greenamyre, JT
    Levey, AI
    ANNALS OF NEUROLOGY, 2000, 48 (03) : 429 - 429
  • [26] Large-scale proteomics collaboration to study Alzheimer's disease
    不详
    PHARMACOGENOMICS, 2003, 4 (03) : 243 - 243
  • [27] A Review of Alzheimer's Disease Classification Using Neuropsychological Data and Machine Learning
    Lyu, Gang
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [28] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [29] Large-scale prediction of activity cliffs using machine and deep learning methods of increasing complexity
    Shunsuke Tamura
    Tomoyuki Miyao
    Jürgen Bajorath
    Journal of Cheminformatics, 15
  • [30] The Role of Medication Data to Enhance the Prediction of Alzheimer's Progression Using Machine Learning
    El-Sappagh, Shaker
    Abuhmed, Tamer
    Alouffi, Bader
    Sahal, Radhya
    Abdelhade, Naglaa
    Saleh, Hager
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021