Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features

被引:24
|
作者
Li, Yuanyue [1 ]
Kuhn, Michael [1 ]
Gavin, Anne-Claude [1 ,2 ,5 ]
Bork, Peer [1 ,2 ,3 ,4 ]
机构
[1] European Mol Biol Lab, Struct & Computat Biol Unit, Heidelberg, Germany
[2] Mol Med Partnership Unit, D-69117 Heidelberg, Germany
[3] Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany
[4] Univ Wurzburg, Bioctr, Dept Bioinformat, D-97074 Wurzburg, Germany
[5] Univ Geneva, Ctr Med Univ, Dept Cell Physiol & Metab, Geneva, Switzerland
关键词
SPECTROMETRY DATA; FRAGMENTATION;
D O I
10.1093/bioinformatics/btz736
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites' structures from MS/MS spectra is still a great challenge. Results: We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 50 条
  • [1] Intensity-based protein identification by machine learning from a library of tandem mass spectra
    Elias, JE
    Gibbons, FD
    King, OD
    Roth, FP
    Gygi, SP
    NATURE BIOTECHNOLOGY, 2004, 22 (02) : 214 - 219
  • [2] Intensity-based protein identification by machine learning from a library of tandem mass spectra
    Joshua E Elias
    Francis D Gibbons
    Oliver D King
    Frederick P Roth
    Steven P Gygi
    Nature Biotechnology, 2004, 22 : 214 - 219
  • [3] Machine learning for identification of silylated derivatives from mass spectra
    Ljoncheva, Milka
    Stepisnik, Tomaz
    Kosjek, Tina
    Dzeroski, Saso
    JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [4] Machine learning for identification of silylated derivatives from mass spectra
    Milka Ljoncheva
    Tomaž Stepišnik
    Tina Kosjek
    Sašo Džeroski
    Journal of Cheminformatics, 14
  • [5] In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids
    Kangas, Lars J.
    Metz, Thomas O.
    Isaac, Giorgis
    Schrom, Brian T.
    Ginovska-Pangovska, Bojana
    Wang, Luning
    Tan, Li
    Lewis, Robert R.
    Miller, John H.
    BIOINFORMATICS, 2012, 28 (13) : 1705 - 1713
  • [6] Towards de novo identification of metabolites by analyzing tandem mass spectra
    Boecker, Sebastian
    Rasche, Florian
    BIOINFORMATICS, 2008, 24 (16) : I49 - I55
  • [7] Peptide Identification from Mixture Tandem Mass Spectra
    Wang, Jian
    Perez-Santiago, Josue
    Katz, Jonathan E.
    Mallick, Parag
    Bandeira, Nuno
    MOLECULAR & CELLULAR PROTEOMICS, 2010, 9 (07) : 1476 - 1485
  • [8] An unsupervised machine learning method for assessing quality of tandem mass spectra
    Lin, Wenjun
    Wang, Jianxin
    Zhang, Wen-Jun
    Wu, Fang-Xiang
    PROTEOME SCIENCE, 2012, 10
  • [9] An unsupervised machine learning method for assessing quality of tandem mass spectra
    Wenjun Lin
    Jianxin Wang
    Wen-Jun Zhang
    Fang-Xiang Wu
    Proteome Science, 10
  • [10] A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data
    Zhou, Cong
    Bowler, Lucas D.
    Feng, Jianfeng
    BMC BIOINFORMATICS, 2008, 9 (1)