The METLIN small molecule dataset for machine learning-based retention time prediction

被引:132
|
作者
Domingo-Almenara, Xavier [1 ,4 ]
Guijas, Carlos [1 ]
Billings, Elizabeth [1 ]
Montenegro-Burke, J. Rafael [1 ]
Uritboonthai, Winnie [1 ]
Aisporna, Aries E. [1 ]
Chen, Emily [2 ]
Benton, H. Paul [1 ]
Siuzdak, Gary [1 ,3 ]
机构
[1] Scripps Res Inst, Scripps Ctr Metabol, La Jolla, CA 92037 USA
[2] Scripps Res Inst, Calif Inst Biomed Res Calibr, La Jolla, CA 92037 USA
[3] Scripps Res Inst, Dept Integrat Struct & Computat Biol, La Jolla, CA 92037 USA
[4] EURECAT Technol Ctr Catalonia & Rovira & Virgili, Ctr Omic Sci, Reus, Catalonia, Spain
基金
美国国家卫生研究院;
关键词
METABOLITE IDENTIFICATION; DIFFERENT GRADIENTS; WEB SERVER; FLOW-RATES; LIQUID; ANNOTATION; PROJECTION; SPECTRUM; MODELS;
D O I
10.1038/s41467-019-13680-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70% of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] The METLIN small molecule dataset for machine learning-based retention time prediction
    Xavier Domingo-Almenara
    Carlos Guijas
    Elizabeth Billings
    J. Rafael Montenegro-Burke
    Winnie Uritboonthai
    Aries E. Aisporna
    Emily Chen
    H. Paul Benton
    Gary Siuzdak
    [J]. Nature Communications, 10
  • [2] Machine Learning-Based Retention Time Prediction of Trimethylsilyl Derivatives of Metabolites
    de Cripan, Sara M.
    Cereto-Massague, Adria
    Herrero, Pol
    Barcaru, Andrei
    Canela, Nuria
    Domingo-Almenara, Xavier
    [J]. BIOMEDICINES, 2022, 10 (04)
  • [3] Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction
    Bouwmeester, Robbin
    Martens, Lennart
    Degroeve, Sven
    [J]. ANALYTICAL CHEMISTRY, 2019, 91 (05) : 3694 - 3703
  • [4] MACHINE LEARNING-BASED PERFORMANCE PREDICTION MODEL OPTIMIZATION FOR SOI LDMOS USING ADAPTIVE SMALL SPACE DATASET
    You, Jinwen
    Chen, Jing
    Yao, Qing
    Dai, Yuxuan
    Guo, Yufeng
    [J]. CONFERENCE OF SCIENCE & TECHNOLOGY FOR INTEGRATED CIRCUITS, 2024 CSTIC, 2024,
  • [5] Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset
    Kanagarathinam, Karthick
    Sankaran, Durairaj
    Manikandan, R.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2022, 140
  • [6] Machine learning-based prediction of transfusion
    Mitterecker, Andreas
    Hofmann, Axel
    Trentino, Kevin M.
    Lloyd, Adam
    Leahy, Michael F.
    Schwarzbauer, Karin
    Tschoellitsch, Thomas
    Boeck, Carl
    Hochreiter, Sepp
    Meier, Jens
    [J]. TRANSFUSION, 2020, 60 (09) : 1977 - 1986
  • [7] MACHINE LEARNING-BASED EARLY MORTALITY PREDICTION AT THE TIME OF ICU ADMISSION
    McManus, Sean
    Almuqati, Reem
    Khatib, Reem
    Khanna, Ashish
    Cywinski, Jacek
    Papay, Francis
    Mathur, Piyush
    [J]. CRITICAL CARE MEDICINE, 2022, 50 (01) : 607 - 607
  • [8] Machine Learning-Based Time Series Prediction at Brazilian Stocks Exchange
    dos Santos Gularte, Ana Paula
    Filho, Danusio Gadelha Guimaraes
    de Oliveira Torres, Gabriel
    da Silva, Thiago Carvalho Nunes
    Curtis, Vitor Venceslau
    [J]. COMPUTATIONAL ECONOMICS, 2023,
  • [9] Machine learning-based radiotherapy time prediction and treatment scheduling management
    Xie, Lisiqi
    Xu, Dan
    He, Kangjian
    Tian, Xin
    [J]. JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2023, 24 (09):
  • [10] Bayesian machine learning-based method for prediction of slope failure time
    Zhang, Jie
    Wang, Zipeng
    Hu, Jinzheng
    Xiao, Shihao
    Shang, Wenyu
    [J]. JOURNAL OF ROCK MECHANICS AND GEOTECHNICAL ENGINEERING, 2022, 14 (04) : 1188 - 1199