Chemical representation learning for toxicity prediction

被引：16

作者：

Born, Jannis ^{[1
,2
]}

Markert, Greta ^{[1
,3
]}

Janakarajan, Nikita ^{[1
,4
]}

Kimber, Talia B. ^{[5
]}

Volkamer, Andrea ^{[5
,6
]}

Martinez, Maria Rodriguez ^{[1
]}

Manica, Matteo ^{[1
]}

机构：

[1] IBM Res Europe, Zurich, Switzerland

[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland

[3] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Zurich, Switzerland

[4] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

[5] Charite Univ med Berlin, Inst Physiol, In sil Toxicol & Struct Bioinformat, Charitepl 1, D-10117 Berlin, Germany

[6] Saarland Univ, Data Driven Drug Design, D-66123 Saarbrucken, Germany

来源：

DIGITAL DISCOVERY | 2023年 / 2卷 / 03期

关键词：

STRUCTURAL ALERTS; DRUG DISCOVERY; NEURAL-NETWORK; RECEPTOR; SMILES; TOOL; P53;

D O I：

10.1039/d2dd00099g

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. A chemical language model for molecular property prediction: it outperforms prior art, is validated on a large, proprietary toxicity dataset, reveals cytotoxic motifs through attention & uses two uncertainty techniques to improve model reliability.

引用

页码：674 / 691

页数：18

共 50 条

[31] APTox: Assessment and Prediction on Toxicity of Chemical Mixtures
Liu Shushen
Zhang Jin
Zhang Yahui
Qin Litang
ACTA CHIMICA SINICA, 2012, 70 (14) : 1511 - 1517
[32] Principles of toxicity prediction from chemical structure
Barratt, MD
PROGRESS IN THE REDUCTION, REFINEMENT AND REPLACEMENT OF ANIMAL EXPERIMENTATION, 2000, 31 : 449 - 456
[33] An in vitro model for the prediction of chemical metabolism and toxicity
Ding, Shaohong
Vardy, Audrey
Elcombe, Clifford R.
Wolf, C. Roland
TOXICOLOGY, 2009, 262 (01) : 23 - 23
[34] Learning User Embedding Representation for Gender Prediction
Chen, Li
Qian, Tieyun
Zhu, Peisong
You, Zhenni
2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 263 - 269
[35] A representation learning framework for stock movement prediction
Feng, Wenzhi
Ma, Xiang
Li, Xuemei
Zhang, Caiming
APPLIED SOFT COMPUTING, 2023, 144
[36] Unsupervised Visual Representation Learning by Context Prediction
Doersch, Carl
Gupta, Abhinav
Efros, Alexei A.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1422 - 1430
[37] Hierarchical Node Representation Learning for Stock Prediction
Yue, Zhihan
Tan, Ying
ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 465 - 477
[38] Representation Learning Beyond Linear Prediction Functions
Xu, Ziping
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[39] Reward Prediction for Representation Learning and Reward Shaping
Hlynsson, Hlynur David
Wiskott, Laurenz
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 267 - 276
[40] Representation learning of image composition for aesthetic prediction
Zhao, Lin
Shang, Meimei
Gao, Fei
Li, Rongsheng
Huang, Fei
Yu, Jun
COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 199 (199)

← 1 2 3 4 5 →