Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins

被引:4
|
作者
Claeys, Tine [1 ,2 ]
Menu, Maxime [1 ,2 ]
Bouwmeester, Robbin [1 ,2 ]
Gevaert, Kris [1 ,2 ]
Martens, Lennart [1 ,2 ]
机构
[1] VIB, VIB UGent Ctr Med Biotechnol, B-9052 Ghent, Belgium
[2] Univ Ghent, Dept Biomol Med, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
public data; proteomics; reprocessing; machine learning; tissue specificity; CANCER; PRIDE; DRAFT; LINES;
D O I
10.1021/acs.jproteome.2c00644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
引用
收藏
页码:1181 / 1192
页数:12
相关论文
共 50 条
  • [41] Large-Scale Proteomics Identifies Novel Biomarkers and Circulating Risk Factors for Aortic Stenosis
    Shelbaya, Khaled
    Arthur, Victoria
    Yang, Yimin
    Dorbala, Pranav
    Buckley, Leo
    Claggett, Brian
    Skali, Hicham
    Dufresne, Line
    Yang, Ta-Yu
    Engert, James C.
    Thanassoulis, George
    Floyd, James
    Austin, Thomas R.
    Bortnick, Anna
    Kizer, Jorge
    Freitas, Renata C. C.
    Singh, Sasha A.
    Aikawa, Elena
    Hoogeveen, Ron C.
    Ballantyne, Christie
    Yu, Bing
    Coresh, Josef
    Blaha, Michael J.
    Matsushita, Kunihiro
    Shah, Amil M.
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2024, 83 (05) : 577 - 591
  • [42] Large-Scale Machine Learning for Business Sector Prediction
    Angenent, Mitch N.
    Barata, Antonio Pereira
    Takes, Frank W.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
  • [43] Compressed Linear Algebra for Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 960 - 971
  • [44] Angel: a new large-scale machine learning system
    Jiang, Jie
    Yu, Lele
    Jiang, Jiawei
    Liu, Yuhong
    Cui, Bin
    NATIONAL SCIENCE REVIEW, 2018, 5 (02) : 216 - 236
  • [45] Quick extreme learning machine for large-scale classification
    Audi Albtoush
    Manuel Fernández-Delgado
    Eva Cernadas
    Senén Barro
    Neural Computing and Applications, 2022, 34 : 5923 - 5938
  • [46] Machine learning for large-scale crop yield forecasting
    Paudel, Dilli
    Boogaard, Hendrik
    de Wit, Allard
    Janssen, Sander
    Osinga, Sjoukje
    Pylianidis, Christos
    Athanasiadis, Ioannis N.
    AGRICULTURAL SYSTEMS, 2021, 187
  • [47] Compressed linear algebra for large-scale machine learning
    Ahmed Elgohary
    Matthias Boehm
    Peter J. Haas
    Frederick R. Reiss
    Berthold Reinwald
    The VLDB Journal, 2018, 27 : 719 - 744
  • [48] A review of Nystrom methods for large-scale machine learning
    Sun, Shiliang
    Zhao, Jing
    Zhu, Jiang
    INFORMATION FUSION, 2015, 26 : 36 - 48
  • [49] Introduction to Special Issue on Large-Scale Machine Learning
    Hsu, Chun-Nan
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [50] Large-scale machine learning for metagenomics sequence classification
    Vervier, Kevin
    Mahe, Pierre
    Tournoud, Maud
    Veyrieras, Jean-Baptiste
    Vert, Jean-Philippe
    BIOINFORMATICS, 2016, 32 (07) : 1023 - 1032