The METLIN small molecule dataset for machine learning-based retention time prediction

被引:132
|
作者
Domingo-Almenara, Xavier [1 ,4 ]
Guijas, Carlos [1 ]
Billings, Elizabeth [1 ]
Montenegro-Burke, J. Rafael [1 ]
Uritboonthai, Winnie [1 ]
Aisporna, Aries E. [1 ]
Chen, Emily [2 ]
Benton, H. Paul [1 ]
Siuzdak, Gary [1 ,3 ]
机构
[1] Scripps Res Inst, Scripps Ctr Metabol, La Jolla, CA 92037 USA
[2] Scripps Res Inst, Calif Inst Biomed Res Calibr, La Jolla, CA 92037 USA
[3] Scripps Res Inst, Dept Integrat Struct & Computat Biol, La Jolla, CA 92037 USA
[4] EURECAT Technol Ctr Catalonia & Rovira & Virgili, Ctr Omic Sci, Reus, Catalonia, Spain
基金
美国国家卫生研究院;
关键词
METABOLITE IDENTIFICATION; DIFFERENT GRADIENTS; WEB SERVER; FLOW-RATES; LIQUID; ANNOTATION; PROJECTION; SPECTRUM; MODELS;
D O I
10.1038/s41467-019-13680-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70% of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Machine-learning based prediction of small molecule-surface interaction potentials
    Rouse, Ian
    Lobaskin, Vladimir
    [J]. FARADAY DISCUSSIONS, 2023, 244 (00) : 306 - 335
  • [32] Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval
    Karisani, Payam
    Qin, Zhaohui S.
    Agichtein, Eugene
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [33] A dataset of pomegranate growth stages for machine learning-based monitoring and analysis
    Zhao, Jifei
    Almodfer, Rolla
    Wu, Xiaoying
    Wang, Xinfa
    [J]. DATA IN BRIEF, 2023, 50
  • [34] An open auscultation dataset for machine learning-based respiratory diagnosis studies
    Zhou, Guanyu
    Liu, Chengjian
    Li, Xiaoguang
    Liang, Sicong
    Wang, Ruichen
    Huang, Xun
    [J]. JASA EXPRESS LETTERS, 2024, 4 (05):
  • [35] Machine learning-based epoxy resin property prediction
    Jang, Huiwon
    Ryu, Dayoung
    Lee, Wonseok
    Park, Geunyeong
    Kim, Jihan
    [J]. MOLECULAR SYSTEMS DESIGN & ENGINEERING, 2024, 9 (09): : 959 - 968
  • [36] Machine Learning-Based Approach for Hardware Faults Prediction
    Khalil, Kasem
    Eldash, Omar
    Kumar, Ashok
    Bayoumi, Magdy
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (11) : 3880 - 3892
  • [37] Machine learning-based prediction of compound profiling matrices
    Perez, Raquel Rodriguez
    Bajorath, Jurgen
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [38] Machine learning-based weather prediction with radiosonde observations
    Gogen, Eralp
    Guney, Selda
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (04): : 2317 - 2328
  • [39] Interpretability of machine learning-based prediction models in healthcare
    Stiglic, Gregor
    Kocbek, Primoz
    Fijacko, Nino
    Zitnik, Marinka
    Verbert, Katrien
    Cilar, Leona
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (05)
  • [40] Machine Learning-Based Academic Result Prediction System
    Bhushan, Megha
    Verma, Utkarsh
    Garg, Chetna
    Negi, Arun
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2024, 12 (01)