Probabilistic generative transformer language models for generative design of molecules

被引:4
|
作者
Wei, Lai [1 ]
Fu, Nihang [1 ]
Song, Yuqi [1 ]
Wang, Qian [2 ]
Hu, Jianjun [1 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29201 USA
[2] Univ South Carolina, Dept Chem & Biochem, Columbia, SC 29201 USA
基金
美国国家科学基金会;
关键词
Deep learning; Language models; Molecules generator; Molecules discovery; Blank filling; SMILES;
D O I
10.1186/s13321-023-00759-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Foundation Models, Generative AI, and Large Language Models
    Ross, Angela
    McGrow, Kathleen
    Zhi, Degui
    Rasmy, Laila
    [J]. CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 377 - 387
  • [22] Probabilistic Constraint Programming for Parameters Optimisation of Generative Models
    Zanin, Massimiliano
    Correia, Marco
    Sousa, Pedro A. C.
    Cruz, Jorge
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 376 - 387
  • [23] Deep generative models for peptide design
    Wan, Fangping
    Kontogiorgos-Heintz, Daphne
    de la Fuente-Nunez, Cesar
    [J]. DIGITAL DISCOVERY, 2022, 1 (03): : 195 - 208
  • [24] Probabilistic Typology: Deep Generative Models of Vowel Inventories
    Cotterell, Ryan
    Eisner, Jason
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1182 - 1192
  • [25] The Advent of Generative Language Models in Medical Education
    Karabacak, Mert
    Ozkara, Burak Berksu
    Margetis, Konstantinos
    Wintermark, Max
    Bisdas, Sotirios
    [J]. JMIR MEDICAL EDUCATION, 2023, 9
  • [26] Application of generative language models to orthopaedic practice
    Caterson, Jessica
    Ambler, Olivia
    Cereceda-Monteoliva, Nicholas
    Horner, Matthew
    Jones, Andrew
    Poacher, Arwel Tomos
    [J]. BMJ OPEN, 2024, 14 (03):
  • [27] Generative Relevance Feedback with Large Language Models
    Mackie, Iain
    Chatterjee, Shubham
    Dalton, Jeffrey
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2026 - 2031
  • [28] Journal policy on large language generative models
    Sessler, Daniel I.
    Turan, Alparslan
    [J]. JOURNAL OF CLINICAL ANESTHESIA, 2024, 96
  • [29] Probabilistic generative modelling
    Larsen, R
    Hilger, KB
    [J]. IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 861 - 868
  • [30] Computational Discovery of TTF Molecules with Deep Generative Models
    Yakubovich, Alexander
    Odinokov, Alexey
    Nikolenko, Sergey
    Jung, Yongsik
    Choi, Hyeonho
    [J]. FRONTIERS IN CHEMISTRY, 2021, 9