Probabilistic generative transformer language models for generative design of molecules

被引:4
|
作者
Wei, Lai [1 ]
Fu, Nihang [1 ]
Song, Yuqi [1 ]
Wang, Qian [2 ]
Hu, Jianjun [1 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29201 USA
[2] Univ South Carolina, Dept Chem & Biochem, Columbia, SC 29201 USA
基金
美国国家科学基金会;
关键词
Deep learning; Language models; Molecules generator; Molecules discovery; Blank filling; SMILES;
D O I
10.1186/s13321-023-00759-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer
引用
收藏
页数:15
相关论文
共 50 条
  • [31] DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
    He, Zhengfu
    Sun, Tianxiang
    Tang, Qiong
    Wang, Kuanning
    Huang, Xuanjing
    Qiu, Xipeng
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4521 - 4534
  • [32] Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark
    Cieplinski, Tobiasz
    Danel, Tomasz
    Podlewska, Sabina
    Jastrzebski, Stanislaw
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (11) : 3238 - 3247
  • [33] Calibrating generative models: The probabilistic Chomsky-Schutzenberger hierarchy
    Icard, Thomas F.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2020, 95
  • [34] Generative Models
    Sim-Hui Tee
    [J]. Erkenntnis, 2023, 88 : 23 - 41
  • [35] Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models
    Park, Dogyun
    Kim, Suhyun
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20042 - 20052
  • [36] Generative probabilistic models extend the scope of inferential structure determination
    Olsson, Simon
    Boomsma, Wouter
    Frellsen, Jes
    Bottaro, Sandro
    Harder, Tim
    Ferkinghoff-Borg, Jesper
    Hamelryck, Thomas
    [J]. JOURNAL OF MAGNETIC RESONANCE, 2011, 213 (01) : 182 - 186
  • [37] Scalable and exact sampling method for probabilistic generative graph models
    Sebastian Moreno
    Joseph J. Pfeiffer
    Jennifer Neville
    [J]. Data Mining and Knowledge Discovery, 2018, 32 : 1561 - 1596
  • [38] Musical Similarity and Commonness Estimation based on Probabilistic Generative Models
    Nakano, Tomoyasu
    Yoshii, Kazuyoshi
    Goto, Masataka
    [J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 197 - 204
  • [39] A TALE OF THREE PROBABILISTIC FAMILIES: DISCRIMINATIVE, DESCRIPTIVE, AND GENERATIVE MODELS
    Wu, Ying Nian
    Gao, Ruiqi
    Han, Tian
    Zhu, Song-Chun
    [J]. QUARTERLY OF APPLIED MATHEMATICS, 2019, 77 (02) : 423 - 465
  • [40] Scalable and exact sampling method for probabilistic generative graph models
    Moreno, Sebastian
    Pfeiffer, Joseph J., III
    Neville, Jennifer
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (06) : 1561 - 1596