Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

被引:59
|
作者
Alotaibi, Shoayee [1 ]
Mehmood, Rashid [2 ]
Katib, Iyad [1 ]
Rana, Omer [3 ]
Albeshri, Aiiad [1 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Comp Sci Dept, Jeddah 21589, Saudi Arabia
[2] King Abdulaziz Univ, High Performance Comp Ctr, Jeddah 21589, Saudi Arabia
[3] Cardiff Univ, Sch Comp Sci, Cardiff CF10 3AT, Wales
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 04期
关键词
smart cities; healthcare; Apache Spark; disease detection; symptoms detection; Arabic language; Saudi dialect; Twitter; machine learning; big data; high performance computing (HPC); ENTERPRISE SYSTEMS; ARABIC TWEETS; TRANSPORT; PRIVACY; LOGISTICS; SECURITY; IOT;
D O I
10.3390/app10041398
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    [J]. 2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [2] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [3] Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark
    Alomari, Ebtesam
    Mehmood, Rashid
    Katib, Iyad
    [J]. 2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1888 - 1895
  • [4] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [5] Mobile Big Data Analytics Using Deep Learning and Apache Spark
    Abu Alsheikh, Mohammad
    Niyato, Dusit
    Lin, Shaowei
    Tan, Hwee-Pink
    Han, Zhu
    [J]. IEEE NETWORK, 2016, 30 (03): : 22 - 29
  • [6] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [7] Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning
    Alomari, Ebtesam
    Katib, Iyad
    Mehmood, Rashid
    [J]. MOBILE NETWORKS & APPLICATIONS, 2023, 28 (02): : 603 - 618
  • [8] Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning
    Ebtesam Alomari
    Iyad Katib
    Rashid Mehmood
    [J]. Mobile Networks and Applications, 2023, 28 : 603 - 618
  • [9] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    [J]. DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [10] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219