Research progress and application of retention time prediction method based on deep learning

被引:2
|
作者
Du Zhuokun [1 ,2 ]
Shao Wei [1 ]
Qin Weijie [1 ,2 ]
机构
[1] Anhui Med Univ, Sch Basic Med, Hefei 230032, Peoples R China
[2] Beijing Inst Life, Beijing Proteome Res Ctr, State Key Lab Prote, Beijing 102206, Peoples R China
关键词
liquid chromatography-tandem mass spectrometry(LC-MS / MS); retention time; deep learning; proteomics; PEPTIDES; PROTEIN; IDENTIFICATION; CHROMATOGRAPHY; PROTEOMICS; SEQUENCE; CANCER; REPRODUCIBILITY; STRATEGY; LEVEL;
D O I
10.3724/SP.J.1123.2020.08015
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In "shotgun" proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography. mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large. scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already. learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide. spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data. dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed.
引用
收藏
页码:211 / 218
页数:8
相关论文
共 53 条
  • [1] A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease
    Addona, Terri A.
    Shi, Xu
    Keshishian, Hasmik
    Mani, D. R.
    Burgess, Michael
    Gillette, Michael A.
    Clauser, Karl R.
    Shen, Dongxiao
    Lewis, Gregory D.
    Farrell, Laurie A.
    Fifer, Michael A.
    Sabatine, Marc S.
    Gerszten, Robert E.
    Carr, Steven A.
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (07) : 635 - U119
  • [2] Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma
    Addona, Terri A.
    Abbatiello, Susan E.
    Schilling, Birgit
    Skates, Steven J.
    Mani, D. R.
    Bunk, David M.
    Spiegelman, Clifford H.
    Zimmerman, Lisa J.
    Ham, Amy-Joan L.
    Keshishian, Hasmik
    Hall, Steven C.
    Allen, Simon
    Blackman, Ronald K.
    Borchers, Christoph H.
    Buck, Charles
    Cardasis, Helene L.
    Cusack, Michael P.
    Dodder, Nathan G.
    Gibson, Bradford W.
    Held, Jason M.
    Hiltke, Tara
    Jackson, Angela
    Johansen, Eric B.
    Kinsinger, Christopher R.
    Li, Jing
    Mesri, Mehdi
    Neubert, Thomas A.
    Niles, Richard K.
    Pulsipher, Trenton C.
    Ransohoff, David
    Rodriguez, Henry
    Rudnick, Paul A.
    Smith, Derek
    Tabb, David L.
    Tegeler, Tony J.
    Variyath, Asokan M.
    Vega-Montoto, Lorenzo J.
    Wahlander, Asa
    Waldemarson, Sofia
    Wang, Mu
    Whiteaker, Jeffrey R.
    Zhao, Lei
    Anderson, N. Leigh
    Fisher, Susan J.
    Liebler, Daniel C.
    Paulovich, Amanda G.
    Regnier, Fred E.
    Tempst, Paul
    Carr, Steven A.
    [J]. NATURE BIOTECHNOLOGY, 2009, 27 (07) : 633 - U85
  • [3] Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics
    Afkham, Heydar Maboudi
    Qiu, Xuanbin
    The, Matthew
    Kall, Lukas
    [J]. BIOINFORMATICS, 2017, 33 (04) : 508 - 513
  • [4] Optimal de novo Design of MRM Experiments for Rapid Assay Development in Targeted Proteomics
    Bertsch, Andreas
    Jung, Stephan
    Zerck, Alexandra
    Pfeifer, Nice
    Nahnsen, Sven
    Henneges, Carsten
    Nordheim, Alfred
    Kohlbacher, Oliver
    [J]. JOURNAL OF PROTEOME RESEARCH, 2010, 9 (05) : 2696 - 2704
  • [5] Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing
    Bittremieux, Wout
    Meysman, Pieter
    Noble, William Stafford
    Laukens, Kris
    [J]. JOURNAL OF PROTEOME RESEARCH, 2018, 17 (10) : 3463 - 3474
  • [6] Bouwmeester R, 2020, BIORXIV
  • [7] Optimization of Experimental Parameters in Data-Independent Mass Spectrometry Significantly Increases Depth and Reproducibility of Results
    Bruderer, Roland
    Bernhardt, Oliver M.
    Gandhi, Tejas
    Xuan, Yue
    Sondermann, Julia
    Schmidt, Manuela
    Gomez-Varela, David
    Reiter, Lukas
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2017, 16 (12) : 2296 - 2309
  • [8] Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine
    Chi, Hao
    Liu, Chao
    Yang, Hao
    Zeng, Wen-Feng
    Wu, Long
    Zhou, Wen-Jing
    Wang, Rui-Min
    Niu, Xiu-Nan
    Ding, Yue-He
    Zhang, Yao
    Wang, Zhao-Wei
    Chen, Zhen-Lin
    Sun, Rui-Xiang
    Liu, Tao
    Tan, Guang-Ming
    Dong, Meng-Qiu
    Xu, Ping
    Zhang, Pei-Heng
    He, Si-Min
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (11) : 1059 - +
  • [9] MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation
    Degroeve, Sven
    Maddelein, Davy
    Martens, Lennart
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) : W326 - W330
  • [10] Single-particle electron cryomicroscopy
    Doerr, Allison
    [J]. NATURE METHODS, 2014, 11 (01) : 30 - 30