Chemical representation learning for toxicity prediction

被引:16
|
作者
Born, Jannis [1 ,2 ]
Markert, Greta [1 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Kimber, Talia B. [5 ]
Volkamer, Andrea [5 ,6 ]
Martinez, Maria Rodriguez [1 ]
Manica, Matteo [1 ]
机构
[1] IBM Res Europe, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[5] Charite Univ med Berlin, Inst Physiol, In sil Toxicol & Struct Bioinformat, Charitepl 1, D-10117 Berlin, Germany
[6] Saarland Univ, Data Driven Drug Design, D-66123 Saarbrucken, Germany
来源
DIGITAL DISCOVERY | 2023年 / 2卷 / 03期
关键词
STRUCTURAL ALERTS; DRUG DISCOVERY; NEURAL-NETWORK; RECEPTOR; SMILES; TOOL; P53;
D O I
10.1039/d2dd00099g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. A chemical language model for molecular property prediction: it outperforms prior art, is validated on a large, proprietary toxicity dataset, reveals cytotoxic motifs through attention & uses two uncertainty techniques to improve model reliability.
引用
收藏
页码:674 / 691
页数:18
相关论文
共 50 条
  • [21] In Silico Prediction of Chemical Acute Dermal Toxicity Using Explainable Machine Learning Methods
    Lou, Shang
    Yu, Zhuohang
    Huang, Zejun
    Wang, Haoqiang
    Pan, Fei
    Li, Weihua
    Liu, Guixia
    Tang, Yun
    CHEMICAL RESEARCH IN TOXICOLOGY, 2024, 37 (03) : 513 - 524
  • [22] Prediction of protein pKa with representation learning
    Gokcan, Hatice
    Isayev, Olexandr
    CHEMICAL SCIENCE, 2022, 13 (08) : 2462 - 2474
  • [23] Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network
    Chen, Jiarui
    Si, Yain-Whar
    Un, Chon-Wai
    Siu, Shirley W., I
    JOURNAL OF CHEMINFORMATICS, 2021, 13 (01)
  • [24] Machine learning-driven oral-to-inhalation extrapolation for chemical toxicity value prediction
    Matsumura, K.
    TOXICOLOGY LETTERS, 2024, 399 : S256 - S257
  • [25] In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts
    Yang, Hongbin
    Sun, Lixia
    Li, Weihua
    Liu, Guixia
    Tang, Yun
    FRONTIERS IN CHEMISTRY, 2018, 6
  • [26] Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network
    Jiarui Chen
    Yain-Whar Si
    Chon-Wai Un
    Shirley W. I. Siu
    Journal of Cheminformatics, 13
  • [27] A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example
    Lin, Run-Hsin
    Lin, Pinpin
    Wang, Chia-Chi
    Tung, Chun-Wei
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):
  • [28] In silico prediction of chemical acute contact toxicity on honey bees via machine learning methods
    Xu, Xuan
    Zhao, Piaopiao
    Wang, Zhiyuan
    Zhang, Xiaoxiao
    Wu, Zengrui
    Li, Weihua
    Tang, Yun
    Liu, Guixia
    TOXICOLOGY IN VITRO, 2021, 72
  • [29] INVESTIGATION OF THE LEARNING PROCESS OF DIVERSE COMPOUND STRUCTURES BY CHEMICAL LANGUAGE MODELS TOWARD TOXICITY PREDICTION
    Mizuno, Tadahaya
    Yoshikai, Yasuhiro
    Nemoto, Shumpei
    Kusuhara, Hiroyuki
    DRUG METABOLISM AND PHARMACOKINETICS, 2024, 55
  • [30] Co-model for chemical toxicity prediction based on multi-task deep learning
    Yuan Li, Yuan
    Chen, Lingfeng
    Pu, Chengtao
    Zang, Chengdong
    Yan, YingChao
    Chen, Yadong
    Zhang, Yanmin
    Liu, Haichun
    MOLECULAR INFORMATICS, 2023, 42 (05)