Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark

被引:15
|
作者
Abdel-Fattah, Manal A. [1 ]
Othman, Nermin Abdelhakim [1 ,2 ]
Goher, Nagwa [1 ,3 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Dept Informat Syst, Helwan, Egypt
[2] British Univ, Fac Informat & Comp Sci, Cairo, Egypt
[3] Nahda Univ Beni Suef, Fac Comp Sci, Dept Informat Syst, Bani Suwayf, Egypt
关键词
BIG DATA;
D O I
10.1155/2022/9898831
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chronic kidney disease (CKD) has become a widespread disease among people. It is related to various serious risks like cardiovascular disease, heightened risk, and end-stage renal disease, which can be feasibly avoidable by early detection and treatment of people in danger of this disease. The machine learning algorithm is a source of significant assistance for medical scientists to diagnose the disease accurately in its outset stage. Recently, Big Data platforms are integrated with machine learning algorithms to add value to healthcare. Therefore, this paper proposes hybrid machine learning techniques that include feature selection methods and machine learning classification algorithms based on big data platforms (Apache Spark) that were used to detect chronic kidney disease (CKD). The feature selection techniques, namely, Relief-F and chi-squared feature selection method, were applied to select the important features. Six machine learning classification algorithms were used in this research: decision tree (DT), logistic regression (LR), Naive Bayes (NB), Random Forest (RF), support vector machine (SVM), and Gradient-Boosted Trees (GBT Classifier) as ensemble learning algorithms. Four methods of evaluation, namely, accuracy, precision, recall, and F1-measure, were applied to validate the results. For each algorithm, the results of cross-validation and the testing results have been computed based on full features, the features selected by Relief-F, and the features selected by chi-squared feature selection method. The results showed that SVM, DT, and GBT Classifiers with the selected features had achieved the best performance at 100% accuracy. Overall, Relief-F's selected features are better than full features and the features selected by chi-square.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    [J]. PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49
  • [2] Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark
    Chliah, Hanane
    Battou, Amal
    Hadj, Maryem Ait el
    Laoufi, Adil
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 870 - 878
  • [3] Predicting Chronic Kidney Disease Using Machine Learning Algorithms
    Farjana, Afia
    Liza, Fatema Tabassum
    Pandit, Parth Pratim
    Das, Madhab Chandra
    Hasan, Mahadi
    Tabassum, Fariha
    Hossen, Md. Helal
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 1267 - 1271
  • [4] Machine Learning-based Product Recommendation using Apache Spark
    Chen, Lin
    Li, Rui
    Liu, Yige
    Zhang, Ruixuan
    Woodbridge, Diane Myung-kyung
    [J]. 2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [5] Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm
    Wang, Weilun
    Chakraborty, Goutam
    Chakraborty, Basabi
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (01): : 1 - 17
  • [6] Performance evaluation of intrusion detection based on machine learning using Apache Spark
    Belouch, Mustapha
    El Hadaj, Salah
    Idhammad, Mohamed
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 : 1 - 6
  • [7] MLlib: Machine learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D.B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. Journal of Machine Learning Research, 2016, 17
  • [8] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [9] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [10] An Efficient Ensemble-based Machine Learning approach for Predicting Chronic Kidney Disease
    Chhabra, Divyanshi
    Juneja, Mamta
    Chutani, Gautam
    [J]. CURRENT MEDICAL IMAGING, 2024, 20