Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins

被引:4
|
作者
Claeys, Tine [1 ,2 ]
Menu, Maxime [1 ,2 ]
Bouwmeester, Robbin [1 ,2 ]
Gevaert, Kris [1 ,2 ]
Martens, Lennart [1 ,2 ]
机构
[1] VIB, VIB UGent Ctr Med Biotechnol, B-9052 Ghent, Belgium
[2] Univ Ghent, Dept Biomol Med, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
public data; proteomics; reprocessing; machine learning; tissue specificity; CANCER; PRIDE; DRAFT; LINES;
D O I
10.1021/acs.jproteome.2c00644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
引用
收藏
页码:1181 / 1192
页数:12
相关论文
共 50 条
  • [1] Integration of large-scale data for extraction of integrated Arabidopsis root cell-type specific models
    Michael Scheunemann
    Siobhan M. Brady
    Zoran Nikoloski
    Scientific Reports, 8
  • [2] Integration of large-scale data for extraction of integrated Arabidopsis root cell-type specific models
    Scheunemann, Michael
    Brady, Siobhan M.
    Nikoloski, Zoran
    SCIENTIFIC REPORTS, 2018, 8
  • [3] Transfer learning identifies sequence determinants of cell-type specific regulatory element accessibility
    Salvatore, Marco
    Horlacher, Marc
    Marsico, Annalisa
    Winther, Ole
    Andersson, Robin
    NAR GENOMICS AND BIOINFORMATICS, 2023, 5 (02)
  • [4] Security of NVMe Offloaded Data in Large-Scale Machine Learning
    Krauss, Torsten
    Goetz, Raphael
    Dmitrienko, Alexandra
    COMPUTER SECURITY - ESORICS 2023, PT IV, 2024, 14347 : 143 - 163
  • [5] A machine learning software for large-scale molecular and clinical data
    Pan, L.
    Mikolajczyk, K.
    Dimitrakopoulou-Strauss, A.
    Burger, C.
    Strauss, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2007, 34 : S343 - S343
  • [6] Large-Scale Machine Learning Algorithms for Biomedical Data Science
    Huang, Heng
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 4 - 4
  • [7] Large-Scale Machine Learning and Optimization for Bioinformatics Data Analysis
    Cheng, Jianlin
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [8] Large-Scale Proteomics Differentiates Cholesteatoma from Surrounding Tissues and Identifies Novel Proteins Related to the Pathogenesis
    Britze, Anders
    Birkler, Rune Isak Dupont
    Gregersen, Niels
    Ovesen, Therese
    Palmfeldt, Johan
    PLOS ONE, 2014, 9 (08):
  • [9] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [10] LARGE-SCALE PROTEOMICS IDENTIFIES ANTI-TNF RESPONSE SIGNATURE
    Minar, Phillip P.
    Karns, Rebekah
    Jackson, Kimberly
    Tsai, Yi Ting
    Rosen, Michael J.
    Denson, Lee A.
    GASTROENTEROLOGY, 2019, 156 (06) : S392 - S392