MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data

被引:1
|
作者
Arturi, Katarzyna [1 ]
Harris, Eliza J. [2 ,3 ]
Gasser, Lilian [2 ]
Escher, Beate I. [4 ]
Braun, Georg [4 ]
Bosshard, Robin [5 ]
Hollender, Juliane [1 ,6 ]
机构
[1] Swiss Fed Inst Aquat Sci & Technol Eawag, Dept Environm Chem, Uberlandstr 133, CH-8600 Dubendorf, Switzerland
[2] Swiss Data Sci Ctr SDSC, Andreasstr 5, CH-8092 Zurich, Switzerland
[3] Univ Bern, Climate & Environm Phys Div, Sidlerstr 5, CH-3012 Bern, Switzerland
[4] UFZ Helmholtz Ctr Environm Res, Cell Toxicol, Permoserstr 15, D-04318 Leipzig, Germany
[5] Eidgenoss TH Zurich ETH Zurich, Dept Comp Sci, Univ Str 6, CH-8092 Zurich, Switzerland
[6] Eidgenoss TH Zurich ETH Zurich, Inst Biogeochem & Pollut Dynam, Ramistr 101, CH-8092 Zurich, Switzerland
来源
JOURNAL OF CHEMINFORMATICS | 2025年 / 17卷 / 01期
关键词
ToxCast; Tox21; Toxicity; In vitro assay; Activity prediction; HRMS/MS; Binary classification; XGBoost; SIRIUS; 21ST-CENTURY; TOXICOPHORES; CHEMICALS; EXPOSURE;
D O I
10.1186/s13321-025-00950-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
MLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and target-specific endpoints from the ToxCast/Tox21 invitroDBv4.1 database. For each analyzed HRMS feature, MLinvitroTox generates a 490-bit bioactivity fingerprint used as a basis for prioritization, focusing the time-consuming molecular identification efforts on features most likely to cause adverse effects. The practical advantages of MLinvitroTox are demonstrated for groundwater HRMS data. Among the 874 features for which molecular fingerprints were derived from spectra, including 630 nontargets, 185 spectral matches, and 59 targets, around 4% of the feature/endpoint relationship pairs were predicted to be active. Cross-checking the predictions for targets and spectral matches with invitroDB data confirmed the bioactivity of 120 active and 6791 nonactive pairs while mislabeling 88 active and 56 non-active relationships. By filtering according to bioactivity probability, endpoint scores, and similarity to the training data, the number of potentially toxic features was reduced by at least one order of magnitude. This refinement makes the analytical confirmation of the toxicologically most relevant features feasible, offering significant benefits for cost-efficient chemical risk assessment.Scientific Contribution:In contrast to the classical ML-based approaches for toxicity prediction, MLinvitroTox predicts bioactivity for HRMS features (i.e., distinct m/z signals) based on MS2 fragmentation spectra rather than the chemical structures from the identified features. While the original proof of concept study was accompanied by the release of a MLinvitroTox v1 KNIME workflow, in this study, we release a Python MLinvitroTox v2 package, which, in addition to automation, expands functionality to include predicting toxicity from structures, cleaning up and generating chemical fingerprints, customizing models, and retraining on custom data. Furthermore, as a result of improvements in bioactivity data processing, realized in the concurrently released pytcpl Python package for the custom processing of invitroDBv4.1 input data used for training MLinvitroTox, the current release introduces enhancements in model accuracy, coverage of biological mechanistic targets, and overall interpretability.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] The potential of Ion Mobility Mass Spectrometry for high-throughput and high-resolution lipidomics
    Hinz, Christine
    Liggi, Sonia
    Griffin, Julian L.
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2018, 42 : 42 - 50
  • [2] High-resolution acoustic ejection mass spectrometry for high-throughput library screening
    Hoxie, Nate
    Calabrese, David R.
    Itkin, Zina
    Gomba, Glenn
    Shen, Min
    Verma, Meghav
    Janiszewski, John S.
    Shrimp, Jonathan H.
    Wilson, Kelli M.
    Michael, Sam
    Hall, Matthew D.
    Burton, Lyle
    Covey, Tom
    Liu, Chang
    SLAS TECHNOLOGY, 2024, 29 (06):
  • [3] High-resolution genotyping of Campylobacter species by use of PCR and high-throughput mass spectrometry
    Hannis, James C.
    Manalili, Sheri M.
    Hall, Thomas A.
    Ranken, Raymond
    White, Neill
    Sampath, Rangarajan
    Blyn, Lawrence B.
    Ecker, David J.
    Mandrell, Robert E.
    Fagerquist, Clifton K.
    Bates, Anna H.
    Miller, William G.
    Hofstadler, Steven A.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2008, 46 (04) : 1220 - 1225
  • [4] High-throughput hazard-based scoring, ranking and grouping of engineered nanomaterials
    Hongisto, V.
    Nymark, P.
    Kohonen, J.
    Hattara, J.
    Grafstrom, R.
    TOXICOLOGY LETTERS, 2019, 314 : S202 - S203
  • [5] Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data
    Arturi, Katarzyna
    Hollender, Juliane
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (46) : 18067 - 18079
  • [6] High-Resolution Mass Spectrometry Coupled with XCMS Online for High-throughput Detection and Identification of Drug Metabolites
    Singh, Dilip
    Mettu, Vijaya S.
    Thakur, Aarzoo
    Ailabouni, Anoud
    Prasad, Bhagwat
    JOURNAL OF PHARMACOLOGY AND EXPERIMENTAL THERAPEUTICS, 2023, 385
  • [7] High-Throughput Simultaneous Analysis of Pesticides by Supercritical Fluid Chromatography Coupled with High-Resolution Mass Spectrometry
    Ishibashi, Megumi
    Izumi, Yoshihiro
    Sakai, Miho
    Ando, Takashi
    Fukusaki, Eiichiro
    Bamba, Takeshi
    JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2015, 63 (18) : 4457 - 4463
  • [8] Interpretation of mass spectrometry data for high-throughput proteomics
    Chamrad, DC
    Koerting, G
    Gobom, J
    Thiele, H
    Klose, J
    Meyer, HE
    Blueggel, M
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2003, 376 (07) : 1014 - 1022
  • [9] Interpretation of mass spectrometry data for high-throughput proteomics
    Daniel C. Chamrad
    Gerhard Koerting
    Johan Gobom
    Herbert Thiele
    Joachim Klose
    Helmut E. Meyer
    Martin Blueggel
    Analytical and Bioanalytical Chemistry, 2003, 376 : 1014 - 1022
  • [10] High-throughput quantification of carboxymethyl lysine in serum and plasma using high-resolution accurate mass Orbitrap mass spectrometry
    Rankin, Naomi J.
    Burgess, Karl
    Weidt, Stefan
    Wannamethee, Goya
    Sattar, Naveed
    Welsh, Paul
    ANNALS OF CLINICAL BIOCHEMISTRY, 2019, 56 (03) : 397 - 407