Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

被引:4
|
作者
Antas, Joao [1 ]
Rocha Silva, Rodrigo [2 ,3 ]
Bernardino, Jorge [1 ,2 ]
机构
[1] Coimbra Inst Engn ISEC, Polytech Coimbra, P-3030199 Coimbra, Portugal
[2] Ctr Informat & Syst Univ Coimbra CISUC, P-3030290 Coimbra, Portugal
[3] Sao Paulo Technol Coll, FATEC Mogi Cruzes, BR-08773600 Mogi Das Cruzes, SP, Brazil
关键词
big data; COVID-19; Data Mining; SQL and NoSQL databases;
D O I
10.3390/computers11020029
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases' systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients' symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Uniform data access platform for SQL and NoSQL database systems
    Vathy-Fogarassy, Agnes
    Hugyak, Tamas
    INFORMATION SYSTEMS, 2017, 69 : 93 - 105
  • [2] Evaluation of ACE Properties of Traditional SQL and NoSQL Big Data Systems
    Teresa Gonzalez-Aparicio, Maria
    Younas, Muhammad
    Tuya, Javier
    Casado, Ruben
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1988 - 1995
  • [3] Data in the time of COVID-19: a general methodology to select and secure a NoSQL DBMS for medical data
    ElDahshan, Kamal A.
    AlHabshy, AbdAllah A.
    Abutaleb, Gaber E.
    PEERJ COMPUTER SCIENCE, 2020,
  • [4] Data in the time of COVID-19: A general methodology to select and secure a NoSQL DBMS for medical data
    ElDahshan K.A.
    AlHabshy A.A.
    Abutaleb G.E.
    PeerJ Computer Science, 2020, 6
  • [5] Enhancement Semantic Prediction Big Data Method for COVID-19: Onto-NoSQL
    ElDahshan, K.
    Elsayed, E.K.
    Mancy, H.
    IAENG International Journal of Computer Science, 2020, 47 (04) : 1 - 10
  • [6] Rapid assessment of data systems for COVID-19 vaccination in the WHO African Region
    Mboussou, Franck
    Nkamedjie, Patrick
    Oyaole, Daniel
    Farham, Bridget
    Atagbaza, Ajiri
    Nsasiirwe, Sheillah
    Costache, Ana
    Brooks, Donald
    Wiysonge, Charles Shey
    Impouma, Benido
    EPIDEMIOLOGY & INFECTION, 2024, 152
  • [7] Using Ethereum Smart Contracts to Store and Share COVID-19 Patient Data
    Batchu, Sai
    Patel, Karan
    Henry, Owen S.
    Mohamed, Aleem
    Agarwal, Ank A.
    Hundal, Henna
    Joshi, Aditya
    Thoota, Sankeerth
    Patel, Urvish K.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2022, 14 (01)
  • [8] Probabilistic Risk Assessment of COVID-19 Patients at COVID-19 Assessment Centre
    Ting, Choo-Yee
    Zakariah, Helmi
    Yusri, Yasmin Zulaikha Mohd
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (06) : 1193 - 1201
  • [9] Retail store customer flow and COVID-19 transmission
    Shumsky, Robert A.
    Debo, Laurens
    Lebeaux, Rebecca M.
    Nguyen, Quang P.
    Hoen, Anne G.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2021, 118 (11)
  • [10] QUALITY ASSESSMENT OF US STATE LEVEL COVID-19 DATA
    Irgens, Megan S.
    O'Neill, Riley M.
    Jovel, Krystal S.
    Ruiz, John M.
    PSYCHOSOMATIC MEDICINE, 2021, 83 (07): : A54 - A55