Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins

被引:4
|
作者
Claeys, Tine [1 ,2 ]
Menu, Maxime [1 ,2 ]
Bouwmeester, Robbin [1 ,2 ]
Gevaert, Kris [1 ,2 ]
Martens, Lennart [1 ,2 ]
机构
[1] VIB, VIB UGent Ctr Med Biotechnol, B-9052 Ghent, Belgium
[2] Univ Ghent, Dept Biomol Med, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
public data; proteomics; reprocessing; machine learning; tissue specificity; CANCER; PRIDE; DRAFT; LINES;
D O I
10.1021/acs.jproteome.2c00644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
引用
收藏
页码:1181 / 1192
页数:12
相关论文
共 50 条
  • [31] TensorFlow: A system for large-scale machine learning
    Abadi, Martin
    Barham, Paul
    Chen, Jianmin
    Chen, Zhifeng
    Davis, Andy
    Dean, Jeffrey
    Devin, Matthieu
    Ghemawat, Sanjay
    Irving, Geoffrey
    Isard, Michael
    Kudlur, Manjunath
    Levenberg, Josh
    Monga, Rajat
    Moore, Sherry
    Murray, Derek G.
    Steiner, Benoit
    Tucker, Paul
    Vasudevan, Vijay
    Warden, Pete
    Wicke, Martin
    Yu, Yuan
    Zheng, Xiaoqiang
    PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 265 - 283
  • [32] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [33] Evaluating machine learning methods on a large-scale of in silico fire debris data
    Tang, Larry
    Booppasiri, Slun
    Sigman, Michael E.
    Williams, Mary R.
    FORENSIC CHEMISTRY, 2025, 44
  • [34] An online conjugate gradient algorithm for large-scale data analysis in machine learning
    Xue, Wei
    Wan, Pengcheng
    Li, Qiao
    Zhong, Ping
    Yu, Gaohang
    Tao, Tao
    AIMS MATHEMATICS, 2021, 6 (02): : 1515 - 1537
  • [35] Large-scale data mining using genetics-based machine learning
    Bacardit, Jaume
    Llora, Xavier
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (01) : 37 - 61
  • [36] Humanization of antibodies using a machine learning approach on large-scale repertoire data
    Marks, Claire
    Hummer, Alissa M.
    Chin, Mark
    Deane, Charlotte M.
    BIOINFORMATICS, 2021, 37 (22) : 4041 - 4047
  • [37] Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data
    Huang, Yixuan
    Zhang, Peng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [38] Improving large-scale proteomics by clustering of mass spectrometry data
    Beer, I
    Barnea, E
    Ziv, T
    Admon, A
    PROTEOMICS, 2004, 4 (04) : 950 - 960
  • [39] Efficient Processing of Models for Large-scale Shotgun Proteomics Data
    Grover, Himanshu
    Gopalakrishnan, Vanathi
    PROCEEDINGS OF THE 2012 8TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM 2012), 2012, : 591 - 596
  • [40] Application of large-scale and multicohort plasma proteomics data to discover novel causal proteins in gastric cancer
    Tang, Weihao
    Ma, Xiaoke
    DISCOVER ONCOLOGY, 2024, 15 (01)