Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data

被引:18
|
作者
Arturi, Katarzyna [1 ]
Hollender, Juliane [1 ,2 ]
机构
[1] Swiss Fed Inst Aquat Sci & Technol Eawag, Dept Environm Chem, CH-8600 Dubendorf, Switzerland
[2] Eidgenoss TH Zurich ETH Zurich, Inst Biogeochem & Pollut Dynam, CH-8092 Zurich, Switzerland
关键词
ToxCast; Tox21; toxicity prediction; HRMS; MS; supervised classification; extreme gradientboosting; SIRIUS; IN-VITRO; PREDICTION; CHEMISTRY; TOXICITY; LIBRARY; MODELS; ASSAY;
D O I
10.1021/acs.est.3c00304
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
MLinvitroTox maps toxicologically relevantpollution inaquatic environments by predicting the toxicity of unidentified NTSHRMS/MS features from fragmentation spectra via machine learning. Nontarget high-resolution mass spectrometry screening(NTS HRMS/MS)can detect thousands of organic substances in environmental samples.However, new strategies are needed to focus time-intensive identificationefforts on features with the highest potential to cause adverse effectsinstead of the most abundant ones. To address this challenge, we developedMLinvitroTox, a machine learning framework that uses molecular fingerprintsderived from fragmentation spectra (MS2) for a rapid classificationof thousands of unidentified HRMS/MS features as toxic/nontoxic basedon nearly 400 target-specific and over 100 cytotoxic endpoints fromToxCast/Tox21. Model development results demonstrated that using customizedmolecular fingerprints and models, over a quarter of toxic endpointsand the majority of the associated mechanistic targets could be accuratelypredicted with sensitivities exceeding 0.95. Notably, SIRIUS molecularfingerprints and xboost (Extreme Gradient Boosting) models with SMOTE(Synthetic Minority Oversampling Technique) for handling data imbalancewere a universally successful and robust modeling configuration. Validationof MLinvitroTox on MassBank spectra showed that toxicity could bepredicted from molecular fingerprints derived from MS2 with an averagebalanced accuracy of 0.75. By applying MLinvitroTox to environmentalHRMS/MS data, we confirmed the experimental results obtained withtarget analysis and narrowed the analytical focus from tens of thousandsof detected signals to 783 features linked to potential toxicity,including 109 spectral matches and 30 compounds with confirmed toxicactivity.
引用
收藏
页码:18067 / 18079
页数:13
相关论文
共 50 条
  • [41] Nontarget and high-throughput screening of pesticides and metabolites residues in tea using ultra-high-performance liquid chromatography and quadrupole-orbitrap high-resolution mass spectrometry
    Huang, Hetian
    Li, Zhanbin
    He, Yu
    Huang, Lian
    Xu, Xiaoli
    Pan, Canping
    Guo, Feng
    Yang, Hongbo
    Tang, Shi
    [J]. JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2021, 1179
  • [42] Screening halogenated environmental contaminants in biota based on isotopic pattern and mass defect provided by high resolution mass spectrometry profiling
    Cariou, Ronan
    Omer, Elsa
    Leon, Alexis
    Dervilly-Pinel, Gaud
    Le Bizec, Bruno
    [J]. ANALYTICA CHIMICA ACTA, 2016, 936 : 130 - 138
  • [43] Facilitating high resolution mass spectrometry data processing for screening of environmental water samples: An evaluation of two deconvolution tools
    Bade, Richard
    Causanilles, Ana
    Emke, Erik
    Bijlsma, Lubertus
    Sancho, Juan V.
    Hernandez, Felix
    de Voogt, Pim
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2016, 569 : 434 - 441
  • [44] Environmental screening of acidic compounds based on capillary zone electrophoresis/laser-induced fluorescence detection with identification by gas chromatography/mass spectrometry and gas chromatography/high-resolution mass spectrometry
    Brumley, WC
    Grange, AH
    Kelliher, V
    Patterson, DB
    Montcalm, A
    Glassman, J
    Farley, JW
    [J]. JOURNAL OF AOAC INTERNATIONAL, 2000, 83 (05) : 1059 - 1067
  • [45] High-Resolution Atomic Absorption Spectrometry Combined With Machine Learning Data Processing for Isotope Amount Ratio Analysis of Lithium
    Winckelmann, Alexander
    Nowak, Sascha
    Richter, Silke
    Recknagel, Sebastian
    Riedel, Jens
    Vogl, Jochen
    Panne, Ulrich
    Abad, Carlos
    [J]. ANALYTICAL CHEMISTRY, 2021, 93 (29) : 10022 - 10030
  • [46] Deep learning-based method for automatic resolution of gas chromatography-mass spectrometry data from complex samples
    Fan, Yingjie
    Yu, Chuanxiu
    Lu, Hongmei
    Chen, Yi
    Hu, Binbin
    Zhang, Xingren
    Su, Jiaen
    Zhang, Zhimin
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2023, 1690
  • [47] Solar farm voltage anomaly detection using high-resolution μPMU data-driven unsupervised machine learning
    Dey, Maitreyee
    Rana, Soumya Prakash
    V. Simmons, Clarke
    Dudley, Sandra
    [J]. APPLIED ENERGY, 2021, 303
  • [48] Development of a metabolomics-based data analysis approach for identifying drug metabolites based on high-resolution mass spectrometry
    Ting, Hsiao-Hsien
    Chiou, Yi-Shiou
    Chang, Tien-Yi
    Lin, Guan-Yu
    Li, Pei-Jhen
    Shih, Chia-Lung
    [J]. JOURNAL OF FOOD AND DRUG ANALYSIS, 2023, 31 (01) : 152 - 164
  • [49] Suspect and nontarget screening of mycotoxins and their modified forms in wheat products based on ultrahigh-performance liquid chromatography-high resolution mass spectrometry
    Zhang, Yujie
    Chen, Tiantian
    Chen, Dawei
    Liang, Wenying
    Lu, Xin
    Zhao, Chunxia
    Xu, Guowang
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2023, 1708
  • [50] Impact of molecular composition on viscosity of heavy oil: Machine learning based on semi-quantitative analysis results from high-resolution mass spectrometry
    Qian-Hui Zhao
    Jian-Xun Wu
    Tian-Hang Zhou
    Suo-Qi Zhao
    Quan Shi
    [J]. Petroleum Science, 2024, 21 (06) : 4446 - 4453