Probabilistic generative transformer language models for generative design of molecules

被引:4
|
作者
Wei, Lai [1 ]
Fu, Nihang [1 ]
Song, Yuqi [1 ]
Wang, Qian [2 ]
Hu, Jianjun [1 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29201 USA
[2] Univ South Carolina, Dept Chem & Biochem, Columbia, SC 29201 USA
基金
美国国家科学基金会;
关键词
Deep learning; Language models; Molecules generator; Molecules discovery; Blank filling; SMILES;
D O I
10.1186/s13321-023-00759-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Probabilistic generative transformer language models for generative design of molecules
    Lai Wei
    Nihang Fu
    Yuqi Song
    Qian Wang
    Jianjun Hu
    [J]. Journal of Cheminformatics, 15
  • [2] Towards developing probabilistic generative models for reasoning with natural language representations
    Marcu, D
    Popescu, AM
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 88 - 99
  • [3] A Language for Counterfactual Generative Models
    Tavares, Zenna
    Koppel, James
    Zhang, Xin
    Das, Ria
    Solar-Lezama, Armando
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7180 - 7191
  • [4] On Memorization in Probabilistic Deep Generative Models
    van den Burg, Gerrit J. J.
    Williams, Christopher K. I.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] A generative probabilistic framework for learning spatial language
    Dawson, Colin R.
    Wright, Jeremy
    Rebguns, Antons
    Escarcega, Marco Valenzuela
    Fried, Daniel
    Cohen, Paul R.
    [J]. 2013 IEEE THIRD JOINT INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2013,
  • [6] Reflection on the use of Generative Language Models as a tool for teaching design
    do Amaral, Ines
    [J]. VIII IEEE WORLD ENGINEERING EDUCATION CONFERENCE, EDUNINE 2024, 2024,
  • [7] The Synthesizability of Molecules Proposed by Generative Models
    Gao, Wenhao
    Coley, Connor W.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) : 5714 - 5723
  • [8] A Heideggerian analysis of generative pretrained transformer models
    Floroiu, Iustin
    Timisica, Daniela
    [J]. ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2024, 34 (01): : 13 - 22
  • [9] Generative Models for Molecular Design
    Merz, Kenneth M., Jr.
    De Fabritiis, Gianni
    Wei, Guo-Wei
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) : 5635 - 5636
  • [10] Deep Generative Design: Integration of Topology Optimization and Generative Models
    Oh, Sangeun
    Jung, Yongsu
    Kim, Seongsin
    Lee, Ikjin
    Kang, Namwoo
    [J]. JOURNAL OF MECHANICAL DESIGN, 2019, 141 (11)