Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

被引:59
|
作者
Alotaibi, Shoayee [1 ]
Mehmood, Rashid [2 ]
Katib, Iyad [1 ]
Rana, Omer [3 ]
Albeshri, Aiiad [1 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Comp Sci Dept, Jeddah 21589, Saudi Arabia
[2] King Abdulaziz Univ, High Performance Comp Ctr, Jeddah 21589, Saudi Arabia
[3] Cardiff Univ, Sch Comp Sci, Cardiff CF10 3AT, Wales
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 04期
关键词
smart cities; healthcare; Apache Spark; disease detection; symptoms detection; Arabic language; Saudi dialect; Twitter; machine learning; big data; high performance computing (HPC); ENTERPRISE SYSTEMS; ARABIC TWEETS; TRANSPORT; PRIVACY; LOGISTICS; SECURITY; IOT;
D O I
10.3390/app10041398
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Time-Series Data Analytics Using Spark and Machine Learning
    Thongtra, Patcharee
    Sapronova, Alla
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 509 - 515
  • [32] Use of Machine Learning in Big Data Analytics for Insider Threat Detection
    Mayhew, Michael
    Atighetchi, Michael
    Adler, Aaron
    Greenstadt, Rachel
    [J]. 2015 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2015), 2015, : 915 - 922
  • [33] Big Data Approach For IoT Botnet Traffic Detection Using Apache Spark Technology
    Arokodare, Oluwatomisin
    Wimmer, Hayden
    Du, Jie
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 1260 - 1266
  • [34] Comparative Analysis of Intrusion Detection Models using Big Data Analytics and Machine Learning Techniques
    Alaketu, Muyideen Ayodeji
    Oguntimilehin, Abiodun
    Olatunji, Kehinde Adebola
    Abiola, Oluwatoyin Bunmi
    Badeji-Ajisafe, Bukola
    Akinduyite, Christiana Olanike
    Obamiyi, Stephen Eyitayo
    Babalola, Gbemisola Olutosin
    Okebule, Toyin
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (02) : 326 - 337
  • [35] Design and Evaluation of Scalable Intrusion Detection System Using Machine Learning and Apache Spark
    Yogesh, K.
    Karthik, M.
    Naveen, T.
    Saravanan, S.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [36] Using Machine Learning and Big Data Analytics to Prioritize Outpatients in HetNets
    Hadi, Mohammed
    Lawey, Ahmed
    El-Gorashi, Taisir
    Elmirghani, Jaafar
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 726 - 731
  • [37] Big data analytics and classification of cardiovascular disease using machine learning
    Narejo, Sanam
    Shaikh, Anoud
    Memon, Mehak Maqbool
    Mahar, Kainat
    Aleem, Zonera
    Zardari, Bisharat
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 2025 - 2033
  • [38] Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark
    Chliah, Hanane
    Battou, Amal
    Hadj, Maryem Ait el
    Laoufi, Adil
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 870 - 878
  • [39] A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 635 - 639
  • [40] A Structured Analysis to study the Role of Machine Learning and Deep Learning in The Healthcare Sector with Big Data Analytics
    Kumari, Juli
    Kumar, Ela
    Kumar, Deepak
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2023, 30 (06) : 3673 - 3701