A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

被引:2
|
作者
Tate, Tia [1 ]
Patlewicz, Grace [1 ,2 ]
Shah, Imran [1 ]
机构
[1] US EPA, Ctr Computat Toxicol & Exposure CCTE, Durham, NC 27709 USA
[2] US EPA, Ctr Computat Toxicol & Exposure CCTE, 109 TW Alexander Dr, Res Triangle Pk, NC 27711 USA
关键词
Generalised Read-across (GenRA); High throughput transcriptomics (HTTr); Machine Learning (ML); BIOACTIVITY;
D O I
10.1016/j.comtox.2024.100301
中图分类号
R99 [毒物学(毒理学)];
学科分类号
100405 ;
摘要
Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naive Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted kNearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With undersampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for undersampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Predicting the glass formation of metallic glasses using machine learning approaches
    Li, Zhuang
    Long, Zhilin
    Lei, Shan
    Zhang, Ting
    Liu, Xiaowei
    Kuang, Dumin
    Computational Materials Science, 2021, 197
  • [32] Predicting financial distress using machine learning approaches: Evidence China
    Rahman, Md Jahidur
    Zhu, Hongtao
    JOURNAL OF CONTEMPORARY ACCOUNTING & ECONOMICS, 2024, 20 (01)
  • [33] PREDICTING FUTURE ALCOHOL USE IN ADOLESCENTS USING MACHINE LEARNING APPROACHES
    Mather, Marius
    Newton, Nicola C.
    Birrell, Louise
    Teesson, Maree
    Slade, Tim
    Chapman, Cath
    Mcbride, Nyanda
    Allsop, Steve
    Hides, Leanne
    DRUG AND ALCOHOL REVIEW, 2018, 37 : S13 - S13
  • [34] Predicting the Survival Rate of Titanic Disaster Using Machine Learning Approaches
    Shetty, Jyothi
    Pallavi, S.
    Ramyashree
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [35] Predicting mortality in systemic sclerosis patients using machine learning approaches
    Jang, A.
    Patel, S.
    Patel, S.
    Shah, S.
    Lio, P.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2023, 143 (05) : S37 - S37
  • [36] A machine learning model for predicting patients with major depressive disorder: A study based on transcriptomic data
    Liu, Sitong
    Lu, Tong
    Zhao, Qian
    Fu, Bingbing
    Wang, Han
    Li, Ginhong
    Yang, Fan
    Huang, Juan
    Lyu, Nan
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [37] Comparison of Predicting Regional Mortalities Using Machine Learning Models
    Caglar, Oguzhan
    Ozen, Figen
    ARTIFICIAL INTELLIGENCE FOR INTERNET OF THINGS (IOT) AND HEALTH SYSTEMS OPERABILITY, IOTHIC 2023, 2024, 8 : 59 - 72
  • [38] Predicting drug shortages using pharmacy data and machine learning
    Raman Pall
    Yvan Gauthier
    Sofia Auer
    Walid Mowaswes
    Health Care Management Science, 2023, 26 : 395 - 411
  • [39] Predicting Student Performance Using Clickstream Data and Machine Learning
    Liu, Yutong
    Fan, Si
    Xu, Shuxiang
    Sajjanhar, Atul
    Yeom, Soonja
    Wei, Yuchen
    EDUCATION SCIENCES, 2023, 13 (01):
  • [40] Predicting drug shortages using pharmacy data and machine learning
    Pall, Raman
    Gauthier, Yvan
    Auer, Sofia
    Mowaswes, Walid
    HEALTH CARE MANAGEMENT SCIENCE, 2023, 26 (03) : 395 - 411