LINGO-DL: a text-based approach for molecular similarity searching

被引:1
|
作者
Abdo, Ammar [1 ]
Pupin, Maude [1 ]
机构
[1] Univ Lille, Villeneuve Dascq, France
关键词
Drug discovery; Molecular fingerprints; Ligand-based virtual screening; SMILES; LINGO;
D O I
10.1007/s10822-021-00383-9
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The line notations of chemical structures are more compact than those of graphs and connection tables, so they can be useful for storing and transferring a large number of molecular structures. The simplified molecular input line system (SMILES) representation is the most extensively used, as it is much easier to utilise and comprehend than others, and it can be generated automatically from connection tables. A SMILES represents and encodes the molecule structure. It has been used by an existing method, LINGO, to calculate the molecular similarities and predict the structure-related properties. The LINGO method decomposes a canonical SMILES into a set of substrings of four characters referred to as LINGOs. The purpose of LINGO method is to measure the similarity between a pair of molecules by comparing the LINGOs that occur in each molecule. This paper aims to introduce an alternative version of the LINGO method using LINGOs of different lengths, called LINGO-DL. LINGO-DL is based on the fragmentation of canonical SMILES into substrings of three different lengths rather than one in LINGO method. Retrospective virtual screening experiments with MDDR, DUD, and MUV datasets show that the LINGO-DL outperforms the LINGO method, especially when the active molecules being sought have a high degree of structural heterogeneity.
引用
收藏
页码:657 / 665
页数:9
相关论文
共 50 条
  • [1] LINGO-DL: a text-based approach for molecular similarity searching
    Ammar Abdo
    Maude Pupin
    [J]. Journal of Computer-Aided Molecular Design, 2021, 35 : 657 - 665
  • [2] Text-based database searching
    Lewitter, F
    [J]. TRENDS IN BIOTECHNOLOGY, 1998, : 3 - 5
  • [3] Text-based similarity searching for hit- and lead-candidate identification
    Volker Hähnke
    [J]. Journal of Cheminformatics, 4 (Suppl 1)
  • [4] Implementing a text based method (Lingo) using Finite State Machines for fast similarity searching
    Grant, J. Andrew
    Haigh, James
    Sayle, Roger
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2006, 232 : 26 - 26
  • [5] Pharmacophore Alignment Search Tool: Influence of the Third Dimension on Text-Based Similarity Searching
    Haehnke, Volker
    Klenner, Alexander
    Rippmann, Friedrich
    Schneider, Gisbert
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (08) : 1618 - 1634
  • [6] Pharmacophore Alignment Search Tool: Influence of Scoring Systems on Text-Based Similarity Searching
    Haehnke, Volker
    Schneider, Gisbert
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (08) : 1635 - 1647
  • [7] Impact of Glaucoma and Dry Eye on Text-Based Searching
    Sun, Michelle J.
    Rubin, Gary S.
    Akpek, Esen K.
    Ramulu, Pradeep Y.
    [J]. TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2017, 6 (03):
  • [8] Text-based Document Similarity Matching Using sdtext
    Shields, Clay
    [J]. PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 5607 - 5616
  • [9] Evaluating text-based similarity measures for musical content
    Garay, A
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB DELIVERING OF MUSIC, PROCEEDINGS, 2002, : 49 - 55
  • [10] Text-Based Emotion Recognition Approach
    Razek, Mohammed Abdel
    Frasson, Claude
    [J]. INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 500 - 501