Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus

被引:9
|
作者
Bustamam, Alhadi [1 ]
Hamzah, Haris [1 ]
Husna, Nadya A. [1 ]
Syarofina, Sarah [1 ]
Dwimantara, Nalendra [1 ]
Yanuar, Arry [2 ]
Sarwinda, Devvi [1 ]
机构
[1] Univ Indonesia, Fac Math & Nat Sci, Dept Math, Depok, Indonesia
[2] Univ Indonesia, Fac Pharm, Gedung A Rumpun Ilmu Kesehatan Lantai 1, Depok, Indonesia
关键词
Quantitative structure-activity relationship; K-modes clustering; CatBoost; Rotation Forest; principal component analysis; Sparse principal component analysis; Deep neural network; Fingerprint; PHYSICOCHEMICAL PARAMETERS; ROTATION FOREST; QSAR;
D O I
10.1186/s40537-021-00465-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Background New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. This study aims to build quantitative structure-activity relationship (QSAR) models using the artificial intelligence paradigm. Rotation Forest and Deep Neural Network (DNN) are used to predict QSAR models. We compared principal component analysis (PCA) with sparse PCA (SPCA) as methods for transforming Rotation Forest. K-modes clustering with Levenshtein distance was used for the selection method of molecules, and CatBoost was used for the feature selection method. Results The amount of the DPP-4 inhibitor molecules resulting from the selection process of molecules using K-Modes clustering algorithm is 1020 with logP range value of -1.6693 to 4.99044. Several fingerprint methods such as extended connectivity fingerprint and functional class fingerprint with diameters of 4 and 6 were used to construct four fingerprint datasets, ECFP_4, ECFP_6, FCFP_4, and FCFP_6. There are 1024 features from the four fingerprint datasets that are then selected using the CatBoost method. CatBoost can represent QSAR models with good performance for machine learning and deep learning methods respectively with evaluation metrics, such as Sensitivity, Specificity, Accuracy, and Matthew's correlation coefficient, all valued above 70% with a feature importance level of 60%, 70%, 80%, and 90%. Conclusion The K-modes clustering algorithm can produce a representative subset of DPP-4 inhibitor molecules. Feature selection in the fingerprint dataset using CatBoost is best used before making QSAR Classification and QSAR Regression models. QSAR Classification using Machine Learning and QSAR Classification using Deep Learning, each of which has an accuracy of above 70%. The QSAR RFC-PCA and QSAR RFR-PCA models performed better than QSAR RFC-SPCA and QSAR RFR-SPCA models because QSAR RFC-PCA and QSAR RFR-PCA models have more effective time than the QSAR RFC-SPCA and QSAR RFR-SPCA models.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Ligand-Based Fluorine NMR Screening: Principles and Applications in Drug Discovery Projects
    Dalvit, Claudio
    Vulpetti, Anna
    JOURNAL OF MEDICINAL CHEMISTRY, 2019, 62 (05) : 2218 - 2244
  • [22] Ligand-Based Pharmacophore Modeling and Virtual Screening for the Discovery of Novel 17β-Hydroxysteroid Dehydrogenase 2 Inhibitors
    Vuorinen, Anna
    Engeli, Roger
    Meyer, Arne
    Bachmann, Fabio
    Griesser, Ulrich J.
    Schuster, Daniela
    Odermatt, Alex
    JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (14) : 5995 - 6007
  • [23] Combining ligand-based and structure-based drug design in the virtual screening arena
    Moro, Stefano
    Bacilieri, Magdalena
    Deflorian, Francesca
    EXPERT OPINION ON DRUG DISCOVERY, 2007, 2 (01) : 37 - 49
  • [24] LiSiCA: A Software for Ligand-Based Virtual Screening and Its Application for the Discovery of Butyrylcholinesterase Inhibitors
    Legnik, Sarno
    Stular, Tanja
    Brus, Boris
    Knez, Damijan
    Gobec, Stanislav
    Janezic, Dusanka
    Konc, Janez
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2015, 55 (08) : 1521 - 1528
  • [25] Ligand-based receptor tyrosine kinase partial agonists: new paradigm for cancer drug discovery?
    Riese, David J., II
    EXPERT OPINION ON DRUG DISCOVERY, 2011, 6 (02) : 185 - 193
  • [26] Pitfalls in the assessment of ligand-based virtual screening accuracy
    Heifets, Abraham
    Wallach, Izhar
    Dzamba, Michael
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [27] Consensus queries in ligand-based virtual screening experiments
    Francois Berenger
    Oanh Vu
    Jens Meiler
    Journal of Cheminformatics, 9
  • [28] Novel 2D fingerprints for ligand-based virtual screening
    Ewing, Todd
    Baber, J. Christian
    Feher, Miklos
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2423 - 2431
  • [29] Consensus queries in ligand-based virtual screening experiments
    Berenger, Francois
    Oanh Vu
    Meiler, Jens
    JOURNAL OF CHEMINFORMATICS, 2017, 9
  • [30] Optimal assignment methods for ligand-based virtual screening
    Andreas Jahn
    Georg Hinselmann
    Nikolas Fechner
    Andreas Zell
    Journal of Cheminformatics, 1