Chemical representation learning for toxicity prediction

被引:16
|
作者
Born, Jannis [1 ,2 ]
Markert, Greta [1 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Kimber, Talia B. [5 ]
Volkamer, Andrea [5 ,6 ]
Martinez, Maria Rodriguez [1 ]
Manica, Matteo [1 ]
机构
[1] IBM Res Europe, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[5] Charite Univ med Berlin, Inst Physiol, In sil Toxicol & Struct Bioinformat, Charitepl 1, D-10117 Berlin, Germany
[6] Saarland Univ, Data Driven Drug Design, D-66123 Saarbrucken, Germany
来源
DIGITAL DISCOVERY | 2023年 / 2卷 / 03期
关键词
STRUCTURAL ALERTS; DRUG DISCOVERY; NEURAL-NETWORK; RECEPTOR; SMILES; TOOL; P53;
D O I
10.1039/d2dd00099g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. A chemical language model for molecular property prediction: it outperforms prior art, is validated on a large, proprietary toxicity dataset, reveals cytotoxic motifs through attention & uses two uncertainty techniques to improve model reliability.
引用
收藏
页码:674 / 691
页数:18
相关论文
共 50 条
  • [31] APTox: Assessment and Prediction on Toxicity of Chemical Mixtures
    Liu Shushen
    Zhang Jin
    Zhang Yahui
    Qin Litang
    ACTA CHIMICA SINICA, 2012, 70 (14) : 1511 - 1517
  • [32] Principles of toxicity prediction from chemical structure
    Barratt, MD
    PROGRESS IN THE REDUCTION, REFINEMENT AND REPLACEMENT OF ANIMAL EXPERIMENTATION, 2000, 31 : 449 - 456
  • [33] An in vitro model for the prediction of chemical metabolism and toxicity
    Ding, Shaohong
    Vardy, Audrey
    Elcombe, Clifford R.
    Wolf, C. Roland
    TOXICOLOGY, 2009, 262 (01) : 23 - 23
  • [34] Learning User Embedding Representation for Gender Prediction
    Chen, Li
    Qian, Tieyun
    Zhu, Peisong
    You, Zhenni
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 263 - 269
  • [35] A representation learning framework for stock movement prediction
    Feng, Wenzhi
    Ma, Xiang
    Li, Xuemei
    Zhang, Caiming
    APPLIED SOFT COMPUTING, 2023, 144
  • [36] Unsupervised Visual Representation Learning by Context Prediction
    Doersch, Carl
    Gupta, Abhinav
    Efros, Alexei A.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1422 - 1430
  • [37] Hierarchical Node Representation Learning for Stock Prediction
    Yue, Zhihan
    Tan, Ying
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 465 - 477
  • [38] Representation Learning Beyond Linear Prediction Functions
    Xu, Ziping
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Reward Prediction for Representation Learning and Reward Shaping
    Hlynsson, Hlynur David
    Wiskott, Laurenz
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 267 - 276
  • [40] Representation learning of image composition for aesthetic prediction
    Zhao, Lin
    Shang, Meimei
    Gao, Fei
    Li, Rongsheng
    Huang, Fei
    Yu, Jun
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 199 (199)