Chemical representation learning for toxicity prediction

被引:16
|
作者
Born, Jannis [1 ,2 ]
Markert, Greta [1 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Kimber, Talia B. [5 ]
Volkamer, Andrea [5 ,6 ]
Martinez, Maria Rodriguez [1 ]
Manica, Matteo [1 ]
机构
[1] IBM Res Europe, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[5] Charite Univ med Berlin, Inst Physiol, In sil Toxicol & Struct Bioinformat, Charitepl 1, D-10117 Berlin, Germany
[6] Saarland Univ, Data Driven Drug Design, D-66123 Saarbrucken, Germany
来源
DIGITAL DISCOVERY | 2023年 / 2卷 / 03期
关键词
STRUCTURAL ALERTS; DRUG DISCOVERY; NEURAL-NETWORK; RECEPTOR; SMILES; TOOL; P53;
D O I
10.1039/d2dd00099g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. A chemical language model for molecular property prediction: it outperforms prior art, is validated on a large, proprietary toxicity dataset, reveals cytotoxic motifs through attention & uses two uncertainty techniques to improve model reliability.
引用
收藏
页码:674 / 691
页数:18
相关论文
共 50 条
  • [41] Directed Hypergraph Representation Learning for Link Prediction
    Ma, Zitong
    Zhao, Wenbo
    Yang, Zhe
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [42] Predictive Network Representation Learning for Link Prediction
    Wang, Zhitao
    Chen, Chengyao
    Li, Wenjie
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 969 - 972
  • [43] Heterogeneous hypergraph representation learning for link prediction
    Zhao, Zijuan
    Yang, Kai
    Guo, Jinli
    EUROPEAN PHYSICAL JOURNAL B, 2024, 97 (10):
  • [44] Cycle Representation Learning for Inductive Relation Prediction
    Yan, Zuoyu
    Ma, Tengfei
    Gao, Liangcai
    Tang, Zhi
    Chen, Chao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [45] Disentangled Representation Learning for Astronomical Chemical Tagging
    de Mijolla, Damien
    Ness, Melissa Kay
    Viti, Serena
    Wheeler, Adam Joseph
    ASTROPHYSICAL JOURNAL, 2021, 913 (01):
  • [46] Review of machine learning and deep learning models for toxicity prediction
    Guo, Wenjing
    Liu, Jie
    Dong, Fan
    Song, Meng
    Li, Zoe
    Khan, Md Kamrul Hasan
    Patterson, Tucker A.
    Hong, Huixiao
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (21) : 1952 - 1973
  • [47] Editorial: Deep Learning for Toxicity and Disease Prediction
    Gong, Ping
    Zhang, Chaoyang
    Chen, Minjun
    FRONTIERS IN GENETICS, 2020, 11
  • [48] DeepTox: Toxicity prediction using deep learning
    Klambauer, Guenter
    Unterthiner, Thomas
    Mayr, Andreas
    Hochreiter, Sepp
    TOXICOLOGY LETTERS, 2017, 280 : S69 - S69
  • [49] Machine Learning for Ionic Liquid Toxicity Prediction
    Wang, Zihao
    Song, Zhen
    Zhou, Teng
    PROCESSES, 2021, 9 (01) : 1 - 10
  • [50] DeepTox: Toxicity Prediction using Deep Learning
    Mayr, Andreas
    Klambauer, Gunter
    Unterthiner, Thomas
    Hochreiter, Sepp
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2016, 3