Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression

被引:0
|
作者
Ma, Simeng [1 ]
Li, Ruiling [1 ]
Gong, Qian [1 ]
Lv, Honggang [1 ]
Deng, Zipeng [1 ]
Wang, Beibei [1 ]
Yao, Lihua [1 ]
Kang, Lijun [1 ]
Xiang, Dan [1 ]
Yang, Jun [2 ]
Liu, Zhongchun [1 ,3 ]
机构
[1] Wuhan Univ, Renmin Hosp, Dept Psychiat, Wuhan 430060, Peoples R China
[2] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[3] Wuhan Univ, Taikang Ctr Life & Med Sci, Wuhan 430071, Peoples R China
基金
中国国家自然科学基金;
关键词
Depression; Proteomic; Biomarkers; CatBoost; MAJOR DEPRESSION; GENE; SCHIZOPHRENIA; HETEROGENEITY; ASSOCIATIONS; DISORDER;
D O I
10.1021/acs.jproteome.4c00389
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
引用
收藏
页码:4043 / 4054
页数:12
相关论文
共 50 条
  • [1] A Data-driven Mechanism for Large-scale Data Distribution
    Shi Peichang
    Li Yiying
    Ding Bo
    Jiang Longquan
    Liu Hui
    Zhang Jie
    [J]. 2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [2] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [3] Large-Scale Data-Driven Financial Risk Modeling using Big Data Technology
    Stockinger, Kurt
    Heitz, Jonas
    Bundi, Nils
    Breymann, Wolfgang
    [J]. 2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 206 - 207
  • [4] Diagnosing bias in data-driven algorithms for healthcare
    Jenna Wiens
    W. Nicholson Price
    Michael W. Sjoding
    [J]. Nature Medicine, 2020, 26 : 25 - 26
  • [5] Diagnosing bias in data-driven algorithms for healthcare
    Wiens, Jenna
    Price, W. Nicholson
    Sjoding, Michael W.
    [J]. NATURE MEDICINE, 2020, 26 (01) : 25 - 26
  • [6] Large-scale Data-driven Segmentation of Banking Customers
    Hossain, Md Monir
    Sebestyen, Mark
    Mayank, Dhruv
    Ardakanian, Omid
    Khazaei, Hamzeh
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4392 - 4401
  • [7] Large-scale mode identification and data-driven sciences
    Mukhopadhyay, Subhadeep
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01): : 215 - 240
  • [8] In Situ Data-Driven Adaptive Sampling for Large-scale Simulation Data Summarization
    Biswas, Ayan
    Dutta, Soumya
    Pulido, Jesus
    Ahrens, James
    [J]. PROCEEDINGS OF IN SITU INFRASTRUCTURES FOR ENABLING EXTREME-SCALE ANALYSIS AND VISUALIZATION (ISAV 2018), 2018, : 13 - 18
  • [9] Data-Driven Cell Zooming for Large-Scale Mobile Networks
    Jiang, Hao
    Yi, Shuwen
    Wu, Lihua
    Leung, Henry
    Wang, Yuan
    Zhou, Xian
    Chen, Yanqiu
    Yang, Lintao
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (01): : 156 - 168
  • [10] Large-Scale Data-Driven Airline Market Influence Maximization
    Li, Duanshun
    Liu, Jing
    Jeon, Jinsung
    Hong, Seoyoung
    Le, Thai
    Lee, Dongwon
    Park, Noseong
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 914 - 924