LINGO-DL: a text-based approach for molecular similarity searching

被引:1
|
作者
Abdo, Ammar [1 ]
Pupin, Maude [1 ]
机构
[1] Univ Lille, Villeneuve Dascq, France
关键词
Drug discovery; Molecular fingerprints; Ligand-based virtual screening; SMILES; LINGO;
D O I
10.1007/s10822-021-00383-9
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The line notations of chemical structures are more compact than those of graphs and connection tables, so they can be useful for storing and transferring a large number of molecular structures. The simplified molecular input line system (SMILES) representation is the most extensively used, as it is much easier to utilise and comprehend than others, and it can be generated automatically from connection tables. A SMILES represents and encodes the molecule structure. It has been used by an existing method, LINGO, to calculate the molecular similarities and predict the structure-related properties. The LINGO method decomposes a canonical SMILES into a set of substrings of four characters referred to as LINGOs. The purpose of LINGO method is to measure the similarity between a pair of molecules by comparing the LINGOs that occur in each molecule. This paper aims to introduce an alternative version of the LINGO method using LINGOs of different lengths, called LINGO-DL. LINGO-DL is based on the fragmentation of canonical SMILES into substrings of three different lengths rather than one in LINGO method. Retrospective virtual screening experiments with MDDR, DUD, and MUV datasets show that the LINGO-DL outperforms the LINGO method, especially when the active molecules being sought have a high degree of structural heterogeneity.
引用
收藏
页码:657 / 665
页数:9
相关论文
共 50 条
  • [41] Text-based crude oil price forecasting: A deep learning approach
    Li, Xuerong
    Shang, Wei
    Wang, Shouyang
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2019, 35 (04) : 1548 - 1560
  • [42] Risk dependence between energy corporations: A text-based measurement approach
    Li, Jingyu
    Li, Jianping
    Zhu, Xiaoqian
    [J]. INTERNATIONAL REVIEW OF ECONOMICS & FINANCE, 2020, 68 : 33 - 46
  • [43] Determinants of financial constraint: a text-based financial constraint index approach
    Jose, Amal
    Bhaduri, Saumitra
    [J]. MACROECONOMICS AND FINANCE IN EMERGING MARKET ECONOMIES, 2024,
  • [44] TEXT-BASED APPROACH TO WRITING DEVELOPMENT AT RUSSIAN AS A SECOND LANGUAGE LESSONS
    Makrishina, Nadezhda Vladimirovna
    Khabibullina, Elena Viktorovna
    Lubomir, Guzi
    [J]. REVISTA ENTRELINGUAS, 2021, 7 : 193 - 202
  • [45] A systemic functional grammar of Chinese nominal groups: A text-based approach
    Liu, Ning
    [J]. LANGUAGE CONTEXT AND TEXT-THE SOCIAL SEMIOTICS FORUM, 2023, 5 (02): : 428 - 434
  • [46] Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval
    Shen, Fei
    Shu, Xiangbo
    Du, Xiaoyu
    Tang, Jinhui
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8922 - 8931
  • [47] A Framework for Games-Based Construction Learning: A Text-Based Programming Languages Approach
    Franca Batista, Andre Luiz
    Connolly, Thomas
    Peres Angotti, Jose Andre
    [J]. PROCEEDINGS OF THE 10TH EUROPEAN CONFERENCE ON GAMES BASED LEARNING, 2016, : 815 - 823
  • [48] Designing Emotions for Health Care Chatbots: Text-Based or Icon-Based Approach
    Yu, Shubin
    Zhao, Luming
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (12)
  • [49] Evaluation of N-grams conflation approach in text-based information retrieval
    Kosinov, S
    [J]. EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 136 - 142
  • [50] A novel feature extraction approach for text-based language identification: Binary patterns
    Kaya, Yilmaz
    Ertugrul, Omer Faruk
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2016, 31 (04): : 1085 - 1094