MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra

被引:79
|
作者
Huber, Florian [1 ]
van der Burg, Sven [1 ]
van der Hooft, Justin J. J. [2 ]
Ridder, Lars [1 ]
机构
[1] Netherlands eSci Ctr, NL-1098 XG Amsterdam, Netherlands
[2] Wageningen Univ, Bioinformat Grp, NL-6708 PB Wageningen, Netherlands
关键词
Mass spectrometry; Metabolomics; Spectral similarity measure; Supervised machine learning; Deep learning; DATABASES;
D O I
10.1186/s13321-021-00558-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model's prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.
引用
收藏
页数:14
相关论文
共 24 条
  • [1] MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
    Florian Huber
    Sven van der Burg
    Justin J. J. van der Hooft
    Lars Ridder
    Journal of Cheminformatics, 13
  • [2] Deep learning embedder method and tool for mass spectra similarity search
    Qin, Chunyuan
    Luo, Xiyang
    Deng, Chuan
    Shu, Kunxian
    Zhu, Weimin
    Griss, Johannes
    Hermjakob, Henning
    Bai, Mingze
    Perez-Riverol, Yasset
    JOURNAL OF PROTEOMICS, 2021, 232
  • [3] SPEQ: quality assessment of peptide tandem mass spectra with deep learning
    Gholamizoj, Soroosh
    Ma, Bin
    BIOINFORMATICS, 2022, 38 (06) : 1568 - 1574
  • [4] Deep learning prediction of glycopeptide tandem mass spectra powers glycoproteomics
    Zong, Yu
    Wang, Yuxin
    Qiu, Xipeng
    Huang, Xuanjing
    Qiao, Liang
    NATURE MACHINE INTELLIGENCE, 2024, 6 (08) : 950 - 961
  • [5] Deep learning prediction of electrospray ionization tandem mass spectra of chemically derived molecules
    Chen, Bin
    Li, Hailiang
    Huang, Rongfu
    Tang, Yanan
    Li, Feng
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [6] Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning
    Siegfried Gessulat
    Tobias Schmidt
    Daniel Paul Zolg
    Patroklos Samaras
    Karsten Schnatbaum
    Johannes Zerweck
    Tobias Knaute
    Julia Rechenberger
    Bernard Delanghe
    Andreas Huhmer
    Ulf Reimer
    Hans-Christian Ehrlich
    Stephan Aiche
    Bernhard Kuster
    Mathias Wilhelm
    Nature Methods, 2019, 16 : 509 - 518
  • [7] Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra
    Duehrkop, Kai
    BIOINFORMATICS, 2022, 38 (SUPPL 1) : 342 - 349
  • [8] Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning
    Gessulat, Siegfried
    Schmidt, Tobias
    Zolg, Daniel Paul
    Samaras, Patroklos
    Schnatbaum, Karsten
    Zerweck, Johannes
    Knaute, Tobias
    Rechenberger, Julia
    Delanghe, Bernard
    Huhmer, Andreas
    Reimer, Ulf
    Ehrlich, Hans-Christian
    Aiche, Stephan
    Kuster, Bernhard
    Wilhelm, Mathias
    NATURE METHODS, 2019, 16 (06) : 509 - +
  • [9] Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation
    Liu, Youzhong
    De Vijlder, Thomas
    Bittremieux, Wout
    Laukens, Kris
    Heyndrickx, Wouter
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2021,
  • [10] MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra
    Tabb, DL
    Thompson, MR
    Khalsa-Moyers, G
    VerBerkmoes, NC
    McDonald, WH
    JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2005, 16 (08) : 1250 - 1261