PGPointNovo: an efficient neural network-based tool for parallel de novo peptide sequencing

被引:2
|
作者
Xu, Xiaofang [1 ]
Yang, Chunde [1 ]
He, Qiang [3 ]
Shu, Kunxian [5 ]
Xinpu, Yuan [6 ]
Chen, Zhiguang [4 ]
Zhu, Yunping [2 ]
Chen, Tao [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[2] Beijing Inst Life, Beijing Proteome Res Ctr, Natl Ctr Prot Sci Beijing, State Key Lab Prote, Beijing 102206, Peoples R China
[3] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia
[4] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 26469, Peoples R China
[5] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
[6] Chinese Peoples Liberat Army Gen Hosp, Med Ctr 1, Dept Gen Surg, Beijing, Peoples R China
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
CANCER; NUMBER; VALIDATION; CLUSTERS;
D O I
10.1093/bioadv/vbad057
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
De novo peptide sequencing for tandem mass spectrometry data is not only a key technology for novel peptide identification, but also a precedent task for many downstream tasks, such as vaccine and antibody studies. In recent years, neural network models for de novo peptide sequencing have manifested a remarkable ability to accommodate various data sources and outperformed conventional peptide identification tools. However, the excellent model is computationally expensive, taking up to 1 week to process about 400 000 spectrums. This article presents PGPointNovo, a novel neural network-based tool for parallel de novo peptide sequencing. PGPointNovo uses data parallelization technology to accelerate training and inference and optimizes the training obstacles caused by large batch sizes. The results of extensive experiments conducted on multiple datasets of different sizes demonstrate that compared with PointNovo the excellent neural network-based de novo peptide sequencing tool, PGPointNovo, accelerates de novo peptide sequencing by up to 7.35x without precision or recall compromises.
引用
收藏
页数:3
相关论文
共 50 条
  • [21] Parallel algorithms for the training process of a neural network-based system
    Ammar, HH
    Miao, ZH
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2000, 14 (01): : 3 - 25
  • [22] A Parallel Neural Network-based Scheme for Radar Emitter Recognition
    Ha Phan Khanh Nguyen
    Van Long Do
    Quang Trung Dong
    PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
  • [23] Peptide and protein de novo sequencing by mass spectrometry
    Standing, KG
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2003, 13 (05) : 595 - 601
  • [24] A model of random sequences for de novo peptide sequencing
    Jarman, KD
    Cannon, WR
    Jarman, KH
    Heredia-Langner, A
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 206 - 213
  • [25] De novo peptide sequencing using exhaustive enumeration of peptide composition
    Olson, Matthew T.
    Epstein, Jonathan A.
    Yergey, Alfred L.
    JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2006, 17 (08) : 1041 - 1049
  • [26] De Novo Peptide Sequencing Using Exhaustive Enumeration of Peptide Composition
    Olson, Matthew T.
    Epstein, Jonathan A.
    Yergey, Alfred L.
    Journal of the American Society for Mass Spectrometry, 2006, 17 (08): : 1041 - 1049
  • [27] Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
    Alexeyenko, Andrey
    Nystedt, Bjoern
    Vezzi, Francesco
    Sherwood, Ellen
    Ye, Rosa
    Knudsen, Bjarne
    Simonsen, Martin
    Turner, Benjamin
    de Jong, Pieter
    Wu, Cheng-Cang
    Lundeberg, Joakim
    BMC GENOMICS, 2014, 15
  • [28] Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
    Andrey Alexeyenko
    Björn Nystedt
    Francesco Vezzi
    Ellen Sherwood
    Rosa Ye
    Bjarne Knudsen
    Martin Simonsen
    Benjamin Turner
    Pieter de Jong
    Cheng-Cang Wu
    Joakim Lundeberg
    BMC Genomics, 15
  • [29] Use of deuterium-labeled lysine for efficient protein identification and peptide de novo sequencing
    Gu, S
    Pan, SQ
    Bradbury, EM
    Chen, X
    ANALYTICAL CHEMISTRY, 2002, 74 (22) : 5774 - 5785
  • [30] Efficient Neural Network-based Estimation of Interval Shapley Values
    Napolitano D.
    Vaiani L.
    Cagliero L.
    IEEE Transactions on Knowledge and Data Engineering, 2024, 36 (12) : 1 - 12