Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms

被引:11
|
作者
Hussin, Sahar K. [1 ]
Abdelmageid, Salah M. [2 ]
Alkhalil, Adel [3 ]
Omar, Yasser M. [4 ]
Marie, Mahmoud, I [5 ]
Ramadan, Rabie A. [3 ,6 ]
机构
[1] Alshrouck Acad, Commun & Comp Engn Dept, Cairo, Egypt
[2] Taibah Univ, Comp Engn Dept, Coll Comp Sci & Engn, Medina, Saudi Arabia
[3] Univ Hail, Coll Comp Sci & Engn, Hail, Saudi Arabia
[4] Arab Acad Sci Technol & Maritime Transport, Cairo, Egypt
[5] Al Azhar Univ, Comp & Syst Engn Dept, Cairo, Egypt
[6] Cairo Univ, Comp Engn Dept, Cairo, Egypt
关键词
K-means clustering;
D O I
10.1155/2021/6675279
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data's imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a K-mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no-sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Petrofacies classification using machine learning algorithms
    Silva A.A.
    Tavares M.W.
    Carrasquilla A.
    Misságia R.
    Ceia M.
    Silva, Adrielle A. (adrielle@lenep.uenf.br), 1600, Society of Exploration Geophysicists (85): : WA101 - WA113
  • [32] Virtual Screening Using Machine Learning Approach
    Kumar, Dhananjay
    Sarvate, Anshul
    Singh, Sakshi
    Priya, Puja
    2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), 2013, : 594 - 599
  • [33] Classification and prediction of student performance data using various machine learning algorithms
    Pallathadka H.
    Wenda A.
    Ramirez-Asís E.
    Asís-López M.
    Flores-Albornoz J.
    Phasinam K.
    Materials Today: Proceedings, 2023, 80 : 3782 - 3785
  • [34] Machine Learning Algorithms for Big Data Applications With Policy Implementation
    Wu, Jianzu
    Zhang, Kunxin
    JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2022, 34 (03)
  • [35] Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case
    Aziz, Khadija
    Zaidouni, Dounia
    Bellafkih, Mostafa
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [36] Performance Analysis of Machine Learning Algorithms on Diabetes Dataset using Big Data Analytics
    Kumar, P. Suresh
    Pranavi, S.
    2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 508 - 513
  • [37] A survey of big data architectures and machine learning algorithms in healthcare
    Manogaran G.
    Lopez D.
    International Journal of Biomedical Engineering and Technology, 2017, 25 (2-4) : 182 - 211
  • [38] A Review at Machine Learning Algorithms Targeting Big Data Challenges
    Rathor, Abhinav
    Gyanchandani, Manasi
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 753 - 758
  • [39] Medical Big Data Analysis Using Machine Learning Algorithms in the Field of Clinical Pharmacy
    Kiryu, Yoshihiro
    YAKUGAKU ZASSHI-JOURNAL OF THE PHARMACEUTICAL SOCIETY OF JAPAN, 2022, 142 (04): : 319 - 326
  • [40] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704