Predicting Diabetes using Distributed Machine Learning based on Apache Spark

被引:0
|
作者
Ahmed, Hager [1 ]
Younis, Eman M. G. [1 ]
Ali, Abdelmgeid A. [2 ]
机构
[1] Fac Comp & Informat, Dept Informat Syst, Al Minya, Egypt
[2] Fac Comp & Informat, Dept Comp Sci, Al Minya, Egypt
关键词
Diabetes; Distributed Machine Learning; Apache Spark; SUPPORT VECTOR MACHINES;
D O I
10.1109/itce48509.2020.9047795
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Diabetes mellitus is a long-standing disease. It constitutes a severe challenge for public health worldwide. As stated by the International Diabetes Federation, there are presently about 246 million diabetic people around the world, and this number is anticipated to increase to around 380 million by the year 2025. More than this, 3.8 million death cases occur annually due to diabetes complications. The primary objective of this work is developing an applicable system to predict diabetes using distributed machine learning based on big data platforms such as Spark. In this context, this study aims to develop models using distributed machine learning based on Apache Spark to predict diabetes. Five machine learning classification methods were used like Decision Tree, Support Vector Machine, Logistic Regression Classifier, Naive Bayes, and Random Forest Classifier. Comparison between different algorithms was calculated using three measures, which are accuracy, recall, and precision. The experimental results proposed that LR achieved the highest percentage of accuracy, recall, and precision,82%, 92%, and 82%, respectively.
引用
收藏
页码:44 / 49
页数:6
相关论文
共 50 条
  • [1] Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark
    Abdel-Fattah, Manal A.
    Othman, Nermin Abdelhakim
    Goher, Nagwa
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [2] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [3] Machine Learning-based Product Recommendation using Apache Spark
    Chen, Lin
    Li, Rui
    Liu, Yige
    Zhang, Ruixuan
    Woodbridge, Diane Myung-kyung
    [J]. 2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [4] Model averaging in distributed machine learning: a case study with Apache Spark
    Guo, Yunyan
    Zhang, Zhipeng
    Jiang, Jiawei
    Wu, Wentao
    Zhang, Ce
    Cui, Bin
    Li, Jianzhong
    [J]. VLDB JOURNAL, 2021, 30 (04): : 693 - 712
  • [5] Model averaging in distributed machine learning: a case study with Apache Spark
    Yunyan Guo
    Zhipeng Zhang
    Jiawei Jiang
    Wentao Wu
    Ce Zhang
    Bin Cui
    Jianzhong Li
    [J]. The VLDB Journal, 2021, 30 : 693 - 712
  • [6] Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
    Dunner, Celestine
    Parnell, Thomas
    Atasu, Kubilay
    Sifalakis, Manolis
    Pozidis, Haralampos
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 331 - 338
  • [7] Performance evaluation of intrusion detection based on machine learning using Apache Spark
    Belouch, Mustapha
    El Hadaj, Salah
    Idhammad, Mohamed
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 : 1 - 6
  • [8] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [9] Characterizing Distributed Machine Learning Workloads on Apache Spark (Experimentation and Deployment Paper)
    Djebrouni, Yasmine
    Rocha, Isabelly
    Bouchenak, Sara
    Chen, Lydia
    Felber, Pascal
    Marangozova, Vania
    Schiavoni, Valerio
    [J]. PROCEEDINGS OF THE 24TH ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2023, 2023, : 151 - 164
  • [10] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498