Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data

被引:18
|
作者
Arturi, Katarzyna [1 ]
Hollender, Juliane [1 ,2 ]
机构
[1] Swiss Fed Inst Aquat Sci & Technol Eawag, Dept Environm Chem, CH-8600 Dubendorf, Switzerland
[2] Eidgenoss TH Zurich ETH Zurich, Inst Biogeochem & Pollut Dynam, CH-8092 Zurich, Switzerland
关键词
ToxCast; Tox21; toxicity prediction; HRMS; MS; supervised classification; extreme gradientboosting; SIRIUS; IN-VITRO; PREDICTION; CHEMISTRY; TOXICITY; LIBRARY; MODELS; ASSAY;
D O I
10.1021/acs.est.3c00304
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
MLinvitroTox maps toxicologically relevantpollution inaquatic environments by predicting the toxicity of unidentified NTSHRMS/MS features from fragmentation spectra via machine learning. Nontarget high-resolution mass spectrometry screening(NTS HRMS/MS)can detect thousands of organic substances in environmental samples.However, new strategies are needed to focus time-intensive identificationefforts on features with the highest potential to cause adverse effectsinstead of the most abundant ones. To address this challenge, we developedMLinvitroTox, a machine learning framework that uses molecular fingerprintsderived from fragmentation spectra (MS2) for a rapid classificationof thousands of unidentified HRMS/MS features as toxic/nontoxic basedon nearly 400 target-specific and over 100 cytotoxic endpoints fromToxCast/Tox21. Model development results demonstrated that using customizedmolecular fingerprints and models, over a quarter of toxic endpointsand the majority of the associated mechanistic targets could be accuratelypredicted with sensitivities exceeding 0.95. Notably, SIRIUS molecularfingerprints and xboost (Extreme Gradient Boosting) models with SMOTE(Synthetic Minority Oversampling Technique) for handling data imbalancewere a universally successful and robust modeling configuration. Validationof MLinvitroTox on MassBank spectra showed that toxicity could bepredicted from molecular fingerprints derived from MS2 with an averagebalanced accuracy of 0.75. By applying MLinvitroTox to environmentalHRMS/MS data, we confirmed the experimental results obtained withtarget analysis and narrowed the analytical focus from tens of thousandsof detected signals to 783 features linked to potential toxicity,including 109 spectral matches and 30 compounds with confirmed toxicactivity.
引用
收藏
页码:18067 / 18079
页数:13
相关论文
共 50 条
  • [1] Nontarget screening strategies for PFAS prioritization and identification by high resolution mass spectrometry: A review
    Bugsel, Boris
    Zweigle, Jonathan
    Zwiener, Christian
    [J]. TRENDS IN ENVIRONMENTAL ANALYTICAL CHEMISTRY, 2023, 40
  • [2] Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning
    Boiko, Daniil A.
    Kozlov, Konstantin S.
    V. Burykina, Julia
    Ilyushenkova, Valentina V.
    Ananikov, Valentine P.
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2022, 144 (32) : 14590 - 14606
  • [3] Comparison of Software Tools for Liquid Chromatography-High-Resolution Mass Spectrometry Data Processing in Nontarget Screening of Environmental Samples
    Hohrenk, Lotta L.
    Itzel, Fabian
    Baetz, Nicolai
    Tuerk, Jochen
    Vosough, Maryam
    Schmidt, Torsten C.
    [J]. ANALYTICAL CHEMISTRY, 2020, 92 (02) : 1898 - 1907
  • [4] Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry
    Szabo, Drew
    Falconer, Travis M.
    Fisher, Christine M.
    Heise, Ted
    Phillips, Allison L.
    Vas, Gyorgy
    Williams, Antony J.
    Kruve, Anneli
    [J]. ANALYTICAL CHEMISTRY, 2024, 96 (09) : 3707 - 3716
  • [5] Development of a Machine Learning Algorithm for Drug Screening Analysis on High-Resolution UPLC-MSE/QTOF Mass Spectrometry
    Hao, Ying
    Lynch, Kara
    Fan, Pengcheng
    Jurtschenko, Christopher
    Cid, Maria
    Zhao, Zhen
    Yang, He S.
    [J]. JOURNAL OF APPLIED LABORATORY MEDICINE, 2023, 8 (01): : 53 - 66
  • [6] High-Resolution Satellite Bathymetry Mapping: Regression and Machine Learning-Based Approaches
    Eugenio, Francisco
    Marcello, Javier
    Mederos-Barrera, Antonio
    Marques, Ferran
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [7] Data-Driven and Machine Learning-Based Framework for Image-Guided Single-Cell Mass Spectrometry
    Xie, Yuxuan Richard
    Chari, Varsha K.
    Castro, Daniel C.
    Grant, Romans
    Rubakhin, Stanislav S.
    Sweedler, Jonathan V.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2023, 22 (02) : 491 - 500
  • [8] Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter
    Pan, Qiong
    Hu, Wenya
    He, Ding
    He, Chen
    Zhang, Linzhou
    Shi, Quan
    [J]. TALANTA, 2023, 259
  • [9] Machine learning driven by environmental covariates to estimate high-resolution PM2.5 in data-poor regions
    Jin, Xiaoye
    Ding, Jianli
    Ge, Xiangyu
    Liu, Jie
    Xie, Boqiang
    Zhao, Shuang
    Zhao, Qiaozhen
    [J]. PEERJ, 2022, 10
  • [10] Assessment for the data processing performance of non-target screening analysis based on high-resolution mass spectrometry
    Liu, He
    Wang, Rui
    Zhao, Bo
    Xie, Danping
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2024, 908