A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

被引：2

作者：

Tate, Tia ^{[1
]}

Patlewicz, Grace ^{[1
,2
]}

Shah, Imran ^{[1
]}

机构：

[1] US EPA, Ctr Computat Toxicol & Exposure CCTE, Durham, NC 27709 USA

[2] US EPA, Ctr Computat Toxicol & Exposure CCTE, 109 TW Alexander Dr, Res Triangle Pk, NC 27711 USA

来源：

COMPUTATIONAL TOXICOLOGY | 2024年 / 29卷

关键词：

Generalised Read-across (GenRA); High throughput transcriptomics (HTTr); Machine Learning (ML); BIOACTIVITY;

D O I：

10.1016/j.comtox.2024.100301

中图分类号：

R99 [毒物学（毒理学）];

学科分类号：

100405 ;

摘要：

Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naive Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted kNearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With undersampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for undersampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.

引用

页数：14

共 50 条

[21] Comparison of Machine Learning Approaches for Reconstructing Sea Subsurface Salinity Using Synthetic Data
Tian, Tian
Leng, Hongze
Wang, Gongjie
Li, Guancheng
Song, Junqiang
Zhu, Jiang
An, Yuzhu
REMOTE SENSING, 2022, 14 (22)
[22] A comparison of machine learning algorithms for predicting consumer responses based on physical, chemical, and physical-chemical data of fruits
Ribeiro, Michele Nayara
Carvalho, Iago Augusto
Ferreira, Danton Diego
Marques Pinheiro, Ana Carla
JOURNAL OF SENSORY STUDIES, 2022, 37 (03)
[23] Machine learning classifier approaches for predicting response to RTK-type-III inhibitors demonstrate high accuracy using transcriptomic signatures and ex vivo data
Ferrato, Mauricio H.
Marsh, Adam G.
Franke, Karl R.
Huang, Benjamin J.
Kolb, E. Anders
DeRyckere, Deborah
Grahm, Douglas K.
Chandrasekaran, Sunita
Crowgey, Erin L.
NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
[24] Unveiling the potential of machine learning approaches in predicting the emergence of stroke at its onset: a predicting framework
Lavanya, J. M. Sheela
Subbulakshmi, P.
SCIENTIFIC REPORTS, 2024, 14 (01):
[25] Predicting Hypertension Subtypes with Machine Learning Using Targeted Metabolites and Their Ratios
Reel, Smarti
Reel, Parminder S.
Erlic, Zoran
Amar, Laurence
Pecori, Alessio
Larsen, Casper K.
Tetti, Martina
Pamporaki, Christina
Prehn, Cornelia
Adamski, Jerzy
Prejbisz, Aleksander
Ceccato, Filippo
Scaroni, Carla
Kroiss, Matthias
Dennedy, Michael C.
Deinum, Jaap
Eisenhofer, Graeme
Langton, Katharina
Mulatero, Paolo
Reincke, Martin
Rossi, Gian Paolo
Lenzini, Livia
Davies, Eleanor
Gimenez-Roqueplo, Anne-Paule
Assie, Guillaume
Blanchard, Anne
Zennaro, Maria-Christina
Beuschlein, Felix
Jefferson, Emily R.
METABOLITES, 2022, 12 (08)
[26] Leveraging Machine Learning Approaches for Predicting Antidepressant Treatment Response Using Electroencephalography (EEG) and Clinical Data
Jaworska, Natalia
de la Salle, Sara
Ibrahim, Mohamed-Hamza
Blier, Pierre
Knott, Verner
FRONTIERS IN PSYCHIATRY, 2019, 9
[27] Machine Learning Approaches for Predicting Fatty Acid Classes in Popular US Snacks Using NHANES Data
Tachie, Christabel Y. E.
Obiri-Ananey, Daniel
Tawiah, Nii Adjetey
Attoh-Okine, Nii
Aryee, Alberta N. A.
NUTRIENTS, 2023, 15 (15)
[28] Importance of GWAS Risk Loci and Clinical Data in Predicting Asthma Using Machine-learning Approaches
Qin, Zan-Mei
Liang, Si-Qiao
Long, Jian-Xiong
Deng, Jing-Min
Wei, Xuan
Yang, Mei-Ling
Tang, Shao-Jie
Li, Hai-Li
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2024, 27 (03) : 400 - 407
[29] Predicting Meridian in Chinese traditional medicine using machine learning approaches
Wang, Yinyin
Jafari, Mohieddin
Tang, Yun
Tang, Jing
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (11)
[30] Predicting the glass formation of metallic glasses using machine learning approaches
Li, Zhuang
Long, Zhilin
Lei, Shan
Zhang, Ting
Liu, Xiaowei
Kuang, Dumin
COMPUTATIONAL MATERIALS SCIENCE, 2021, 197

← 1 2 3 4 5 →