A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

被引:2
|
作者
Tate, Tia [1 ]
Patlewicz, Grace [1 ,2 ]
Shah, Imran [1 ]
机构
[1] US EPA, Ctr Computat Toxicol & Exposure CCTE, Durham, NC 27709 USA
[2] US EPA, Ctr Computat Toxicol & Exposure CCTE, 109 TW Alexander Dr, Res Triangle Pk, NC 27711 USA
关键词
Generalised Read-across (GenRA); High throughput transcriptomics (HTTr); Machine Learning (ML); BIOACTIVITY;
D O I
10.1016/j.comtox.2024.100301
中图分类号
R99 [毒物学(毒理学)];
学科分类号
100405 ;
摘要
Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naive Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted kNearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With undersampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for undersampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Predicting childhood asthma using machine learning and data integration approaches
    Kothalawala, Dilini
    Murray, Clare
    Simpson, Angela
    Custovic, Adnan
    Tapper, William
    Arshad, Hasan
    Holloway, John
    Rezwan, Faisal
    CLINICAL AND EXPERIMENTAL ALLERGY, 2021, 51 (12): : 1683 - 1683
  • [2] Mining genetic and transcriptomic data using machine learning approaches in Parkinson’s disease
    Chang Su
    Jie Tong
    Fei Wang
    npj Parkinson's Disease, 6
  • [3] Mining genetic and transcriptomic data using machine learning approaches in Parkinson's disease
    Su, Chang
    Tong, Jie
    Wang, Fei
    NPJ PARKINSONS DISEASE, 2020, 6 (01)
  • [4] Predicting melanoma staging using targeted RNA sequencing data using machine learning
    Ahmed, Fahad Shabbir
    Bin Irfan, Furqan
    CANCER RESEARCH, 2022, 82 (12)
  • [5] A Comparison of Machine Learning Approaches for Predicting Employee Attrition
    Guerranti, Filippo
    Dimitri, Giovanna Maria
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [6] Predicting antioxidant activity of compounds based on chemical Predicting antioxidant activity of compounds based on chemical structure using machine learning methods structure using machine learning methods
    Jung, Jinwoo
    Moon, Jeon-Ok
    Ahn, Song Ih
    Lee, Haeseung
    KOREAN JOURNAL OF PHYSIOLOGY & PHARMACOLOGY, 2024, 28 (06): : 527 - 537
  • [7] Predicting Hepatotoxicity Using ToxCast in Vitro Bioactivity and Chemical Structure
    Liu, Jie
    Mansouri, Kamel
    Judson, Richard S.
    Martin, Matthew T.
    Hong, Huixiao
    Chen, Minjun
    Xu, Xiaowei
    Thomas, Russell S.
    Shah, Imran
    CHEMICAL RESEARCH IN TOXICOLOGY, 2015, 28 (04) : 738 - 751
  • [8] Predicting novel microRNA: a comprehensive comparison of machine learning approaches
    Stegmayer, Georgina
    Di Persia, Leandro E.
    Rubiolo, Mariano
    Gerard, Matias
    Pividori, Milton
    Yones, Cristian
    Bugnon, Leandro A.
    Rodriguez, Tadeo
    Raad, Jonathan
    Milone, Diego H.
    BRIEFINGS IN BIOINFORMATICS, 2019, 20 (05) : 1607 - 1620
  • [9] Predicting Employee Attrition Using Machine Learning Approaches
    Raza, Ali
    Munir, Kashif
    Almutairi, Mubarak
    Younas, Faizan
    Fareed, Mian Muhammad Sadiq
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [10] Predicting Location of Tweets Using Machine Learning Approaches
    Alsaqer, Mohammed
    Alelyani, Salem
    Mohana, Mohamed
    Alreemy, Khalid
    Alqahtani, Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (05):