Chemical representation learning for toxicity prediction

被引:16
|
作者
Born, Jannis [1 ,2 ]
Markert, Greta [1 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Kimber, Talia B. [5 ]
Volkamer, Andrea [5 ,6 ]
Martinez, Maria Rodriguez [1 ]
Manica, Matteo [1 ]
机构
[1] IBM Res Europe, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[5] Charite Univ med Berlin, Inst Physiol, In sil Toxicol & Struct Bioinformat, Charitepl 1, D-10117 Berlin, Germany
[6] Saarland Univ, Data Driven Drug Design, D-66123 Saarbrucken, Germany
来源
DIGITAL DISCOVERY | 2023年 / 2卷 / 03期
关键词
STRUCTURAL ALERTS; DRUG DISCOVERY; NEURAL-NETWORK; RECEPTOR; SMILES; TOOL; P53;
D O I
10.1039/d2dd00099g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. A chemical language model for molecular property prediction: it outperforms prior art, is validated on a large, proprietary toxicity dataset, reveals cytotoxic motifs through attention & uses two uncertainty techniques to improve model reliability.
引用
收藏
页码:674 / 691
页数:18
相关论文
共 50 条
  • [1] A focus on molecular representation learning for the prediction of chemical properties
    Harnik, Yonatan
    Milo, Anat
    CHEMICAL SCIENCE, 2024, 15 (14) : 5052 - 5055
  • [2] TOP: Towards Better Toxicity Prediction by Deep Molecular Representation Learning
    Peng, Yuzhong
    Zhang, Ziqiao
    Jiang, Qizhi
    Guan, Jihong
    Zhou, Shuigeng
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 318 - 325
  • [3] End-to-End Representation Learning for Chemical-Chemical Interaction Prediction
    Kwon, Sunyoung
    Yoon, Sungroh
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1436 - 1447
  • [4] In silico prediction of chemical reproductive toxicity using machine learning
    Jiang, Changsheng
    Yang, Hongbin
    Di, Peiwen
    Li, Weihua
    Tang, Yun
    Liu, Guixia
    JOURNAL OF APPLIED TOXICOLOGY, 2019, 39 (06) : 844 - 854
  • [5] In silico prediction of chemical respiratory toxicity via machine learning
    Wang, Zhiyuan
    Zhao, Piaopiao
    Zhang, Xiaoxiao
    Xu, Xuan
    Li, Weihua
    Liu, Guixia
    Tang, Yun
    COMPUTATIONAL TOXICOLOGY, 2021, 18
  • [6] TOP: A deep mixture representation learning method for boosting molecular toxicity prediction
    Peng, Yuzhong
    Zhang, Ziqiao
    Jiang, Qizhi
    Guan, Jihong
    Zhou, Shuigeng
    METHODS, 2020, 179 : 55 - 64
  • [7] In silico prediction of chemical aquatic toxicity by multiple machine learning and deep learning approaches
    Xu, Minjie
    Yang, Hongbin
    Liu, Guixia
    Tang, Yun
    Li, Weihua
    JOURNAL OF APPLIED TOXICOLOGY, 2022, 42 (11) : 1766 - 1776
  • [8] Learning graphs from examples: An application to the prediction of the toxicity of chemical compounds
    Foggia, Pasquale
    Limongiello, Alessandro
    Tufano, Francesco
    Vento, Mario
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (06) : 883 - 896
  • [9] In Silico Prediction of Chemical Toxicity Profile Using Local Lazy Learning
    Lu, Jing
    Zhang, Pin
    Zou, Xiao-Wen
    Zhao, Xiao-Qiang
    Cheng, Ke-Guang
    Zhao, Yi-Lei
    Bi, Yi
    Zheng, Ming-Yue
    Luo, Xiao-Min
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2017, 20 (04) : 346 - 353
  • [10] PREDICTION OF RESPIRATORY TOXICITY USING CHEMICAL INFORMATION AND MACHINE LEARNING TECHNIQUES
    Ghosh, Dipayan
    Koneti, Geervani
    Ramamurthi, Narayanan
    DRUG METABOLISM AND PHARMACOKINETICS, 2019, 34 (01) : S34 - S34