Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression

被引:0
|
作者
Ma, Simeng [1 ]
Li, Ruiling [1 ]
Gong, Qian [1 ]
Lv, Honggang [1 ]
Deng, Zipeng [1 ]
Wang, Beibei [1 ]
Yao, Lihua [1 ]
Kang, Lijun [1 ]
Xiang, Dan [1 ]
Yang, Jun [2 ]
Liu, Zhongchun [1 ,3 ]
机构
[1] Wuhan Univ, Renmin Hosp, Dept Psychiat, Wuhan 430060, Peoples R China
[2] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[3] Wuhan Univ, Taikang Ctr Life & Med Sci, Wuhan 430071, Peoples R China
基金
中国国家自然科学基金;
关键词
Depression; Proteomic; Biomarkers; CatBoost; MAJOR DEPRESSION; GENE; SCHIZOPHRENIA; HETEROGENEITY; ASSOCIATIONS; DISORDER;
D O I
10.1021/acs.jproteome.4c00389
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
引用
收藏
页码:4043 / 4054
页数:12
相关论文
共 50 条
  • [31] A Data-Driven Krylov Model Order Reduction for Large-Scale Dynamical Systems
    Hamadi, M. A.
    Jbilou, K.
    Ratnani, A.
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2023, 95 (01)
  • [32] Boosting Algorithms for Large-Scale Data and Data Batch Stream
    Yoon, Young Joo
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2010, 23 (01) : 197 - 206
  • [33] Data-Driven Robust and Sparse Solutions for Large-scale Fuzzy Portfolio Optimization
    Yu, Na
    Liang, You
    Thavaneswaran, A.
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [34] Data-driven online distributed disturbance location for large-scale power grids
    Yang, Zekun
    Chen, Yu
    Zhou, Ning
    Polunchenko, Aleksey
    Liu, Yilu
    [J]. IET SMART GRID, 2019, 2 (03) : 381 - 390
  • [35] A Data-Driven Krylov Model Order Reduction for Large-Scale Dynamical Systems
    M. A. Hamadi
    K. Jbilou
    A. Ratnani
    [J]. Journal of Scientific Computing, 2023, 95
  • [36] Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search
    Gao, Feng
    Zhang, Xinfeng
    Huang, Yicheng
    Luo, Yong
    Li, Xiaoming
    Duan, Ling-Yu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2774 - 2787
  • [37] Large-scale transfer learning for data-driven modelling of hot water systems
    Kazmi, Hussain
    Suykens, Johan
    Driesen, Johan
    [J]. PROCEEDINGS OF BUILDING SIMULATION 2019: 16TH CONFERENCE OF IBPSA, 2020, : 2611 - 2618
  • [38] Data-driven framework for large-scale prediction of charging energy in electric vehicles
    Zhao, Yang
    Wang, Zhenpo
    Shen, Zuo-Jun Max
    Sun, Fengchun
    [J]. APPLIED ENERGY, 2021, 282
  • [39] Data-Driven Reservoir Simulation in a Large-Scale Hydrological and Water Resource Model
    Turner, Sean W. D.
    Doering, Kenji
    Voisin, Nathalie
    [J]. WATER RESOURCES RESEARCH, 2020, 56 (10)
  • [40] Data-Driven Joint Resource Allocation in Large-scale Heterogeneous Wireless Networks
    Lin, Kai
    Li, Chensi
    Rodrigues, Joel J. P. C.
    Pace, Pasquale
    Fortino, Giancarlo
    [J]. IEEE NETWORK, 2020, 34 (03): : 163 - 169