Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms

被引:11
|
作者
Hussin, Sahar K. [1 ]
Abdelmageid, Salah M. [2 ]
Alkhalil, Adel [3 ]
Omar, Yasser M. [4 ]
Marie, Mahmoud, I [5 ]
Ramadan, Rabie A. [3 ,6 ]
机构
[1] Alshrouck Acad, Commun & Comp Engn Dept, Cairo, Egypt
[2] Taibah Univ, Comp Engn Dept, Coll Comp Sci & Engn, Medina, Saudi Arabia
[3] Univ Hail, Coll Comp Sci & Engn, Hail, Saudi Arabia
[4] Arab Acad Sci Technol & Maritime Transport, Cairo, Egypt
[5] Al Azhar Univ, Comp & Syst Engn Dept, Cairo, Egypt
[6] Cairo Univ, Comp Engn Dept, Cairo, Egypt
关键词
K-means clustering;
D O I
10.1155/2021/6675279
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data's imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a K-mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no-sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Biomedical Image Classification in a Big Data Architecture Using Machine Learning Algorithms
    Tchito Tchapga, Christian
    Mih, Thomas Attia
    Tchagna Kouanou, Aurelle
    Fozin Fonzin, Theophile
    Kuetche Fogang, Platini
    Mezatio, Brice Anicet
    Tchiotsop, Daniel
    JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021
  • [2] Comparison of machine learning algorithms for classification of Big Data sets
    Singh, Barkha
    Indu, Sreedevi
    Majumdar, Sudipta
    THEORETICAL COMPUTER SCIENCE, 2025, 1024
  • [3] Modeling of class imbalance handling with optimal deep learning enabled big data classification model
    Varshavardhini, S.
    Rajesh, A.
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2023, 17 (04): : 1179 - 1197
  • [4] Classification of Logging Data Using Machine Learning Algorithms
    Mukhamediev, Ravil
    Kuchin, Yan
    Yunicheva, Nadiya
    Kalpeyeva, Zhuldyz
    Muhamedijeva, Elena
    Gopejenko, Viktors
    Rystygulov, Panabek
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [5] Performance Analysis of Machine Learning Algorithms for Big Data Classification: ML and Al-Based Algorithms for Big Data Analysis
    Punia, Sanjeev Kumar
    Kumar, Manoj
    Stephan, Thompson
    Deverajan, Ganesh Gopal
    Patan, Rizwan
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2021, 12 (04) : 60 - 75
  • [6] Swift Imbalance Data Classification using SMOTE and Extreme Learning Machine
    Rustogi, Rishabh
    Prasad, Ayush
    2019 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS 2019), 2019,
  • [7] Big data algorithms beyond machine learning
    Mnich M.
    KI - Kunstliche Intelligenz, 2018, 32 (01): : 9 - 17
  • [8] Classification of Breast Cancer Data Using Machine Learning Algorithms
    Akbugday, Burak
    2019 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2019, : 429 - 432
  • [9] Predicting Student Success Using Big Data and Machine Learning Algorithms
    Ouatik, Farouk
    Erritali, Mohammed
    Ouatik, Fahd
    Jourhmane, Mostafa
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2022, 17 (12): : 236 - 251
  • [10] Air Quality Forecasting Using Big Data and Machine Learning Algorithms
    Koo, Youn-Seo
    Choi, Yunsoo
    Ho, Chang-Hoi
    ASIA-PACIFIC JOURNAL OF ATMOSPHERIC SCIENCES, 2023, 59 (05) : 529 - 530