A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model

被引:6
|
作者
Yang, Qing [1 ]
Gao, Sunan [2 ]
Lin, Junfen [1 ]
Lyu, Ke [3 ]
Wu, Zexu [3 ]
Chen, Yuhao [3 ]
Qiu, Yinwei [1 ]
Zhao, Yanrong [1 ]
Wang, Wei [1 ]
Lin, Tianxiang [1 ]
Pan, Huiyun [4 ]
Chen, Ming [3 ,4 ]
机构
[1] Zhejiang Prov Ctr Dis Control & Prevent, Hangzhou 310051, Peoples R China
[2] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Hangzhou 310058, Peoples R China
[3] Zhejiang Univ, Coll Life Sci, Hangzhou 310058, Peoples R China
[4] Zhejiang Univ, Affiliated Hosp 1, Sch Med, Hangzhou 310058, Peoples R China
关键词
Biological age; Biological features; Machine learning; Interpolation; Stacking; Health status; BLOOD-PRESSURE; BODY HEIGHT; MORTALITY; POPULATION; BIOMARKERS; ADULTS; AUTOENCODERS; CAPACITY; REVEAL; TRENDS;
D O I
10.1186/s12859-022-04966-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Biological age (BA) has been recognized as a more accurate indicator of aging than chronological age (CA). However, the current limitations include: insufficient attention to the incompleteness of medical data for constructing BA; Lack of machine learning-based BA (ML-BA) on the Chinese population; Neglect of the influence of model overfitting degree on the stability of the association results. Methods and results Based on the medical examination data of the Chinese population (45-90 years), we first evaluated the most suitable missing interpolation method, then constructed 14 ML-BAs based on biomarkers, and finally explored the associations between ML-BAs and health statuses (healthy risk indicators and disease). We found that round-robin linear regression interpolation performed best, while AutoEncoder showed the highest interpolation stability. We further illustrated the potential overfitting problem in ML-BAs, which affected the stability of ML-Bas' associations with health statuses. We then proposed a composite ML-BA based on the Stacking method with a simple meta-model (STK-BA), which overcame the overfitting problem, and associated more strongly with CA (r = 0.66, P < 0.001), healthy risk indicators, disease counts, and six types of disease. Conclusion We provided an improved aging measurement method for middle-aged and elderly groups in China, which can more stably capture aging characteristics other than CA, supporting the emerging application potential of machine learning in aging research.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model
    Qing Yang
    Sunan Gao
    Junfen Lin
    Ke Lyu
    Zexu Wu
    Yuhao Chen
    Yinwei Qiu
    Yanrong Zhao
    Wei Wang
    Tianxiang Lin
    Huiyun Pan
    Ming Chen
    [J]. BMC Bioinformatics, 23
  • [2] A data mining approach based on machine learning techniques to classify biological sequences
    Maddouri, M
    Elloumi, M
    [J]. KNOWLEDGE-BASED SYSTEMS, 2002, 15 (04) : 217 - 223
  • [3] Fasting Blood Glucose Change Prediction Model Based on Medical Examination Data and Data Mining Techniques
    Xiao, Wenxiang
    Ji, Jun
    Shao, Fengjing
    Sun, Rencheng
    Xing, Chunxiao
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 742 - 747
  • [4] Algorithm for classification of biological data based on data mining
    Garcia, Eduardo Moniz
    Fonseca, Simone A. S.
    Beingolea, Jorge R.
    [J]. PROCEEDINGS OF THE 2019 IEEE 1ST SUSTAINABLE CITIES LATIN AMERICA CONFERENCE (SCLA), 2019,
  • [5] A biological age model based on physical examination data to predict mortality in a Chinese population
    Jia, Qingqing
    Chen, Chen
    Xu, Andi
    Wang, Sicong
    He, Xiaojie
    Shen, Guoli
    Luo, Yihong
    Tu, Huakang
    Sun, Ting
    Wu, Xifeng
    [J]. ISCIENCE, 2024, 27 (03)
  • [6] A machine learning-based framework for data mining and optimization of a production system
    Koulinas, Georgios
    Paraschos, Panagiotis
    Koulouriotis, Dimitrios
    [J]. FAIM 2021, 2021, 55 : 431 - 438
  • [7] Outlier data mining model for sports data analysis based on machine learning
    Yin, Zhimeng
    Cui, Wei
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2733 - 2742
  • [8] Machine learning-based deep data mining and prediction of vortex-induced vibration of circular cylinders
    Wang, Zhen
    Zhu, Jinsong
    Zhang, Zhitian
    [J]. OCEAN ENGINEERING, 2023, 285
  • [9] Machine Learning-based Energy Consumption Model for Data Center
    Qiao, Lin
    Yu, Yuanqi
    Wang, Qun
    Zhang, Yu
    Song, Yueming
    Yu, Xiaosheng
    [J]. 2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 3051 - 3055
  • [10] Machine learning-based approaches for cancer prediction using microbiome data
    Pedro Freitas
    Francisco Silva
    Joana Vale Sousa
    Rui M. Ferreira
    Céu Figueiredo
    Tania Pereira
    Hélder P. Oliveira
    [J]. Scientific Reports, 13 (1)