Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus

被引:9
|
作者
Bustamam, Alhadi [1 ]
Hamzah, Haris [1 ]
Husna, Nadya A. [1 ]
Syarofina, Sarah [1 ]
Dwimantara, Nalendra [1 ]
Yanuar, Arry [2 ]
Sarwinda, Devvi [1 ]
机构
[1] Univ Indonesia, Fac Math & Nat Sci, Dept Math, Depok, Indonesia
[2] Univ Indonesia, Fac Pharm, Gedung A Rumpun Ilmu Kesehatan Lantai 1, Depok, Indonesia
关键词
Quantitative structure-activity relationship; K-modes clustering; CatBoost; Rotation Forest; principal component analysis; Sparse principal component analysis; Deep neural network; Fingerprint; PHYSICOCHEMICAL PARAMETERS; ROTATION FOREST; QSAR;
D O I
10.1186/s40537-021-00465-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Background New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. This study aims to build quantitative structure-activity relationship (QSAR) models using the artificial intelligence paradigm. Rotation Forest and Deep Neural Network (DNN) are used to predict QSAR models. We compared principal component analysis (PCA) with sparse PCA (SPCA) as methods for transforming Rotation Forest. K-modes clustering with Levenshtein distance was used for the selection method of molecules, and CatBoost was used for the feature selection method. Results The amount of the DPP-4 inhibitor molecules resulting from the selection process of molecules using K-Modes clustering algorithm is 1020 with logP range value of -1.6693 to 4.99044. Several fingerprint methods such as extended connectivity fingerprint and functional class fingerprint with diameters of 4 and 6 were used to construct four fingerprint datasets, ECFP_4, ECFP_6, FCFP_4, and FCFP_6. There are 1024 features from the four fingerprint datasets that are then selected using the CatBoost method. CatBoost can represent QSAR models with good performance for machine learning and deep learning methods respectively with evaluation metrics, such as Sensitivity, Specificity, Accuracy, and Matthew's correlation coefficient, all valued above 70% with a feature importance level of 60%, 70%, 80%, and 90%. Conclusion The K-modes clustering algorithm can produce a representative subset of DPP-4 inhibitor molecules. Feature selection in the fingerprint dataset using CatBoost is best used before making QSAR Classification and QSAR Regression models. QSAR Classification using Machine Learning and QSAR Classification using Deep Learning, each of which has an accuracy of above 70%. The QSAR RFC-PCA and QSAR RFR-PCA models performed better than QSAR RFC-SPCA and QSAR RFR-SPCA models because QSAR RFC-PCA and QSAR RFR-PCA models have more effective time than the QSAR RFC-SPCA and QSAR RFR-SPCA models.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus
    Alhadi Bustamam
    Haris Hamzah
    Nadya A. Husna
    Sarah Syarofina
    Nalendra Dwimantara
    Arry Yanuar
    Devvi Sarwinda
    Journal of Big Data, 8
  • [2] Deep Learning for Ligand-Based Virtual Screening in Drug Discovery
    Bahi, Meriem
    Batouche, Mohamed
    2018 3RD INTERNATIONAL CONFERENCE ON PATTERN ANALYSIS AND INTELLIGENT SYSTEMS (PAIS), 2018, : 268 - 272
  • [3] Anti-MRSA drug discovery by ligand-based virtual screening and biological evaluation
    Lian, Xu
    Xia, Zhonghua
    Li, Xueyao
    Karpov, Pavel
    Jin, Hongwei
    Tetko, Igor, V
    Xia, Jie
    Wu, Song
    BIOORGANIC CHEMISTRY, 2021, 114
  • [4] When drug discovery meets web search: Learning to Rank for ligand-based virtual screening
    Wei Zhang
    Lijuan Ji
    Yanan Chen
    Kailin Tang
    Haiping Wang
    Ruixin Zhu
    Wei Jia
    Zhiwei Cao
    Qi Liu
    Journal of Cheminformatics, 7
  • [5] When drug discovery meets web search: Learning to Rank for ligand-based virtual screening
    Zhang, Wei
    Ji, Lijuan
    Chen, Yanan
    Tang, Kailin
    Wang, Haiping
    Zhu, Ruixin
    Jia, Wei
    Cao, Zhiwei
    Liu, Qi
    JOURNAL OF CHEMINFORMATICS, 2015, 7 : 1 - 13
  • [6] An artificial intelligence accelerated virtual screening platform for drug discovery
    Zhou, Guangfeng
    Rusnac, Domnita-Valeria
    Park, Hahnbeom
    Canzani, Daniele
    Nguyen, Hai Minh
    Stewart, Lance
    Bush, Matthew F.
    Nguyen, Phuong Tran
    Wulff, Heike
    Yarov-Yarovoy, Vladimir
    Zheng, Ning
    Dimaio, Frank
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [7] Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches
    Vazquez, Javier
    Lopez, Manel
    Gibert, Enric
    Herrero, Enric
    Luque, F. Javier
    MOLECULES, 2020, 25 (20):
  • [8] Ligand-based approaches in virtual screening
    Douguet, Dominique
    CURRENT COMPUTER-AIDED DRUG DESIGN, 2008, 4 (03) : 180 - 190
  • [9] GBO-kNN a new framework for enhancing the performance of ligand-based virtual screening for drug discovery
    Mostafa, Aya A.
    Alhossary, Amr A.
    Salem, Sameh A.
    Mohamed, Amr E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197
  • [10] Algorithms for Ligand based Virtual Screening in Drug Discovery
    Babaria, Khushboo
    Das, Shubhankar
    Ambegaokar, Sanya
    Palivela, Hemant
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 862 - 866