Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression

被引:0
|
作者
Ma, Simeng [1 ]
Li, Ruiling [1 ]
Gong, Qian [1 ]
Lv, Honggang [1 ]
Deng, Zipeng [1 ]
Wang, Beibei [1 ]
Yao, Lihua [1 ]
Kang, Lijun [1 ]
Xiang, Dan [1 ]
Yang, Jun [2 ]
Liu, Zhongchun [1 ,3 ]
机构
[1] Wuhan Univ, Renmin Hosp, Dept Psychiat, Wuhan 430060, Peoples R China
[2] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[3] Wuhan Univ, Taikang Ctr Life & Med Sci, Wuhan 430071, Peoples R China
基金
中国国家自然科学基金;
关键词
Depression; Proteomic; Biomarkers; CatBoost; MAJOR DEPRESSION; GENE; SCHIZOPHRENIA; HETEROGENEITY; ASSOCIATIONS; DISORDER;
D O I
10.1021/acs.jproteome.4c00389
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
引用
收藏
页码:4043 / 4054
页数:12
相关论文
共 50 条
  • [41] Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems
    Suendermann, D.
    Liscombe, J.
    Bloom, J.
    Li, G.
    Pieraccini, R.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 820 - 823
  • [42] Data-Driven Robust and Sparse Solutions for Large-scale Fuzzy Portfolio Optimization
    Yu, Na
    Liang, You
    Thavaneswaran, A.
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [43] Data-Driven Decentralized Control for Large-Scale Systems With Sparsity and Communication Delays
    Li, Yan
    Zhang, Hao
    Wang, Zhuping
    Huang, Chao
    Yan, Huaicheng
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (09): : 5614 - 5624
  • [44] Large-scale scenarios of electric vehicle charging with a data-driven model of control
    Powell, Siobhan
    Cezar, Gustavo Vianna
    Apostolaki-Iosifidou, Elpiniki
    Rajagopal, Ram
    [J]. ENERGY, 2022, 248
  • [45] Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
    Gulino, Cole
    Fu, Justin
    Luo, Wenjie
    Tucker, George
    Bronstein, Eli
    Lu, Yiren
    Harb, Jean
    Pan, Xinlei
    Wang, Yan
    Chen, Xiangyu
    Co-Reyes, John D.
    Agarwal, Rishabh
    Roelofs, Rebecca
    Lu, Yao
    Montali, Nico
    Mougin, Paul
    Yang, Zoey
    White, Brandyn
    Faust, Aleksandra
    McAllister, Rowan
    Anguelov, Dragomir
    Sapp, Benjamin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Boosting Algorithms for Large-Scale Data and Data Batch Stream
    Yoon, Young Joo
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2010, 23 (01) : 197 - 206
  • [47] A large-scale digital data collection enables a data-driven approach to research in diet and multiple sclerosis
    Karnoe, A.
    Skovgaard, L.
    Kayser, L.
    [J]. MULTIPLE SCLEROSIS JOURNAL, 2019, 25 (07) : 1044 - 1044
  • [48] Data-Driven Fault Classification in Large-Scale Industrial Processes Using Reduced Number of Process Variables
    Yassaie, Negar
    Gargoum, Sara
    Al-Dabbagh, Ahmad W.
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2023, 21 (04) : 1 - 0
  • [49] A data-driven optimization of large-scale dry port location using the hybrid approach of data mining and complex network theory
    Truong Van Nguyen
    Zhang, Jie
    Zhou, Li
    Meng, Meng
    He, Yong
    [J]. TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2020, 134
  • [50] Diagnosing large-scale stellar magnetic fields using PCA on spectropolarimetric data
    Lehmann, L. T.
    Donati, J-F
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 514 (02) : 2333 - 2345