Probabilistic generative transformer language models for generative design of molecules

被引:4
|
作者
Wei, Lai [1 ]
Fu, Nihang [1 ]
Song, Yuqi [1 ]
Wang, Qian [2 ]
Hu, Jianjun [1 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29201 USA
[2] Univ South Carolina, Dept Chem & Biochem, Columbia, SC 29201 USA
基金
美国国家科学基金会;
关键词
Deep learning; Language models; Molecules generator; Molecules discovery; Blank filling; SMILES;
D O I
10.1186/s13321-023-00759-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Generative Models
    Tee, Sim-Hui
    [J]. ERKENNTNIS, 2023, 88 (01) : 23 - 41
  • [42] De novo molecular design and generative models
    Meyers, Joshua
    Fabian, Benedek
    Brown, Nathan
    [J]. DRUG DISCOVERY TODAY, 2021, 26 (11) : 2707 - 2715
  • [43] Incorporating Stylistic Lexical Preferences in Generative Language Models
    Singh, Hrituraj
    Verma, Gaurav
    Srinivasan, Balaji Vasan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1074 - 1079
  • [44] Generative Models for De Novo Drug Design
    Tong, Xiaochu
    Liu, Xiaohong
    Tan, Xiaoqin
    Li, Xutong
    Jiang, Jiaxin
    Xiong, Zhaoping
    Xu, Tingyang
    Jiang, Hualiang
    Qiao, Nan
    Zheng, Mingyue
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2021, 64 (19) : 14011 - 14027
  • [45] JOINT GENERATIVE AND DISCRIMINATIVE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Dinarelli, Marco
    Moschitti, Alessandro
    Riccardi, Giuseppe
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 61 - 64
  • [46] BartSmiles: Generative Masked Language Models for Molecular Representations
    Chilingaryan, Gayane
    Tamoyan, Hovhannes
    Tevosyan, Ani
    Babayan, Nelly
    Hambardzumyan, Karen
    Navoyan, Zaven
    Aghajanyan, Armen
    Khachatrian, Hrant
    Khondkaryan, Lusine
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (15) : 5832 - 5843
  • [47] Large Language Models and Generative AI, Oh My!
    Zyda, Michael
    [J]. COMPUTER, 2024, 57 (03) : 127 - 132
  • [48] Conditional Molecular Design with Deep Generative Models
    Kang, Seokho
    Cho, Kyunghyun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 43 - 52
  • [49] Prediction, explanation, and the role of generative models in language processing
    Farmer, Thomas A.
    Brown, Meredith
    Tanenhaus, Michael K.
    [J]. BEHAVIORAL AND BRAIN SCIENCES, 2013, 36 (03) : 211 - 212
  • [50] TaleBrush: Sketching Stories with Generative Pretrained Language Models
    Chung, John Joon Young
    Kim, Wooseok
    Yoo, Kang Min
    Lee, Hwaran
    Adar, Eytan
    Chang, Minsuk
    [J]. PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,