Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms

被引:11
|
作者
Hussin, Sahar K. [1 ]
Abdelmageid, Salah M. [2 ]
Alkhalil, Adel [3 ]
Omar, Yasser M. [4 ]
Marie, Mahmoud, I [5 ]
Ramadan, Rabie A. [3 ,6 ]
机构
[1] Alshrouck Acad, Commun & Comp Engn Dept, Cairo, Egypt
[2] Taibah Univ, Comp Engn Dept, Coll Comp Sci & Engn, Medina, Saudi Arabia
[3] Univ Hail, Coll Comp Sci & Engn, Hail, Saudi Arabia
[4] Arab Acad Sci Technol & Maritime Transport, Cairo, Egypt
[5] Al Azhar Univ, Comp & Syst Engn Dept, Cairo, Egypt
[6] Cairo Univ, Comp Engn Dept, Cairo, Egypt
关键词
K-means clustering;
D O I
10.1155/2021/6675279
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data's imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a K-mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no-sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Stator Imbalance Defects Diagnosis of Induction Machine Using Thermography and Machine Learning Algorithms
    El Idrissi, Abderrahman
    Derouich, Aziz
    Mahfoud, Said
    El Ouanjli, Najib
    Byou, Abdelilah
    Banakhr, Fahd A.
    Mosaad, Mohamed I.
    IEEE ACCESS, 2024, 12 : 51606 - 51618
  • [42] Classification of Cardiac Arrhythmias Using Machine Learning Algorithms
    Garcia-Aquino, Christian
    Mujica-Vargas, Dante
    Matuz-Cruz, Manuel
    TELEMATICS AND COMPUTING, WITCOM 2021, 2021, 1430 : 174 - 185
  • [43] Zonda wind classification using machine learning algorithms
    Otero, Federico
    Araneo, Diego
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2021, 41 (S1) : E342 - E353
  • [44] Water Quality Classification Using Machine Learning Algorithms
    Alnaqeb, Reem
    Alketbi, Khuloud
    Alrashdi, Fatema
    Ismail, Heba
    2022 IEEE/ACS 19TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2022,
  • [45] Classification of SSH Attacks using Machine Learning Algorithms
    Sadasivam, Gokul Kannan
    Hota, Chittaranjan
    Anand, Bhojan
    2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 260 - 265
  • [46] Protostellar classification using supervised machine learning algorithms
    O. Miettinen
    Astrophysics and Space Science, 2018, 363
  • [47] Water quality classification using machine learning algorithms
    Nasir, Nida
    Kansal, Afreen
    Alshaltone, Omar
    Barneih, Feras
    Sameer, Mustafa
    Shanableh, Abdallah
    Al-Shamma'a, Ahmed
    JOURNAL OF WATER PROCESS ENGINEERING, 2022, 48
  • [48] Classification of Customer Reviews Using Machine Learning Algorithms
    Noori, Behrooz
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (08) : 567 - 588
  • [49] Protostellar classification using supervised machine learning algorithms
    Miettinen, O.
    ASTROPHYSICS AND SPACE SCIENCE, 2018, 363 (09)
  • [50] Liver Diseases Classification Using Machine Learning Algorithms
    Jovovic, Ivan
    Grebovic, Marko
    Pokvic, Lejla Gurbeta
    Popovic, Tomo
    Cakic, Stevan
    MEDICON 2023 AND CMBEBIH 2023, VOL 1, 2024, 93 : 585 - 593