A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

被引:2
|
作者
Tate, Tia [1 ]
Patlewicz, Grace [1 ,2 ]
Shah, Imran [1 ]
机构
[1] US EPA, Ctr Computat Toxicol & Exposure CCTE, Durham, NC 27709 USA
[2] US EPA, Ctr Computat Toxicol & Exposure CCTE, 109 TW Alexander Dr, Res Triangle Pk, NC 27711 USA
关键词
Generalised Read-across (GenRA); High throughput transcriptomics (HTTr); Machine Learning (ML); BIOACTIVITY;
D O I
10.1016/j.comtox.2024.100301
中图分类号
R99 [毒物学(毒理学)];
学科分类号
100405 ;
摘要
Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naive Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted kNearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With undersampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for undersampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Comparison of Machine Learning Approaches for Reconstructing Sea Subsurface Salinity Using Synthetic Data
    Tian, Tian
    Leng, Hongze
    Wang, Gongjie
    Li, Guancheng
    Song, Junqiang
    Zhu, Jiang
    An, Yuzhu
    REMOTE SENSING, 2022, 14 (22)
  • [22] A comparison of machine learning algorithms for predicting consumer responses based on physical, chemical, and physical-chemical data of fruits
    Ribeiro, Michele Nayara
    Carvalho, Iago Augusto
    Ferreira, Danton Diego
    Marques Pinheiro, Ana Carla
    JOURNAL OF SENSORY STUDIES, 2022, 37 (03)
  • [23] Machine learning classifier approaches for predicting response to RTK-type-III inhibitors demonstrate high accuracy using transcriptomic signatures and ex vivo data
    Ferrato, Mauricio H.
    Marsh, Adam G.
    Franke, Karl R.
    Huang, Benjamin J.
    Kolb, E. Anders
    DeRyckere, Deborah
    Grahm, Douglas K.
    Chandrasekaran, Sunita
    Crowgey, Erin L.
    NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
  • [24] Unveiling the potential of machine learning approaches in predicting the emergence of stroke at its onset: a predicting framework
    Lavanya, J. M. Sheela
    Subbulakshmi, P.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [25] Predicting Hypertension Subtypes with Machine Learning Using Targeted Metabolites and Their Ratios
    Reel, Smarti
    Reel, Parminder S.
    Erlic, Zoran
    Amar, Laurence
    Pecori, Alessio
    Larsen, Casper K.
    Tetti, Martina
    Pamporaki, Christina
    Prehn, Cornelia
    Adamski, Jerzy
    Prejbisz, Aleksander
    Ceccato, Filippo
    Scaroni, Carla
    Kroiss, Matthias
    Dennedy, Michael C.
    Deinum, Jaap
    Eisenhofer, Graeme
    Langton, Katharina
    Mulatero, Paolo
    Reincke, Martin
    Rossi, Gian Paolo
    Lenzini, Livia
    Davies, Eleanor
    Gimenez-Roqueplo, Anne-Paule
    Assie, Guillaume
    Blanchard, Anne
    Zennaro, Maria-Christina
    Beuschlein, Felix
    Jefferson, Emily R.
    METABOLITES, 2022, 12 (08)
  • [26] Leveraging Machine Learning Approaches for Predicting Antidepressant Treatment Response Using Electroencephalography (EEG) and Clinical Data
    Jaworska, Natalia
    de la Salle, Sara
    Ibrahim, Mohamed-Hamza
    Blier, Pierre
    Knott, Verner
    FRONTIERS IN PSYCHIATRY, 2019, 9
  • [27] Machine Learning Approaches for Predicting Fatty Acid Classes in Popular US Snacks Using NHANES Data
    Tachie, Christabel Y. E.
    Obiri-Ananey, Daniel
    Tawiah, Nii Adjetey
    Attoh-Okine, Nii
    Aryee, Alberta N. A.
    NUTRIENTS, 2023, 15 (15)
  • [28] Importance of GWAS Risk Loci and Clinical Data in Predicting Asthma Using Machine-learning Approaches
    Qin, Zan-Mei
    Liang, Si-Qiao
    Long, Jian-Xiong
    Deng, Jing-Min
    Wei, Xuan
    Yang, Mei-Ling
    Tang, Shao-Jie
    Li, Hai-Li
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2024, 27 (03) : 400 - 407
  • [29] Predicting Meridian in Chinese traditional medicine using machine learning approaches
    Wang, Yinyin
    Jafari, Mohieddin
    Tang, Yun
    Tang, Jing
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (11)
  • [30] Predicting the glass formation of metallic glasses using machine learning approaches
    Li, Zhuang
    Long, Zhilin
    Lei, Shan
    Zhang, Ting
    Liu, Xiaowei
    Kuang, Dumin
    COMPUTATIONAL MATERIALS SCIENCE, 2021, 197