Cleaning Big Data Streams: A Systematic Literature Review

被引:6
|
作者
Alotaibi, Obaid [1 ,2 ]
Pardede, Eric [2 ]
Tomy, Sarath [3 ]
Bagui, Sikha
Iacono, Mauro
机构
[1] Shaqra Univ, Coll Arts & Sci, Dept Comp Sci, Sajir Campus, Sajir City 11951, Saudi Arabia
[2] La Trobe Univ, Sch Engn & Math Sci, Dept Comp Sci & Informat Technol, Melbourne Campus, Melbourne, Vic 3086, Australia
[3] La Trobe Univ, Sch Engn & Math Sci, Dept Comp Sci & Informat Technol, Bendigo Campus, Flora Hill, Vic 3552, Australia
关键词
clean; big data; stream; machine learning; deep learning; artificial intelligence; missing value; outliers; duplicate data; irrelevant data; OUTLIER DETECTION; ANOMALY DETECTION; FRAMEWORK;
D O I
10.3390/technologies11040101
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In today's big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Challenges and Issues in Unstructured Big Data: A Systematic Literature Review
    Nafis, Nur Syafiqah Mohd
    Awang, Suryanti
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7716 - 7722
  • [32] Big data and sentiment analysis: A comprehensive and systematic literature review
    Hajiali, Mahdi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (14):
  • [33] Task Scheduling in Big Data Platforms: A Systematic Literature Review
    Soualhia, Mbarka
    Khomh, Foutse
    Tahar, Sofiene
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 134 : 170 - 189
  • [34] A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and Opportunities
    Gaudreault, Jean-Gabriel
    Branco, Paula
    ACM COMPUTING SURVEYS, 2024, 56 (10)
  • [35] State of the art on quality control for data streams: A systematic literature review
    Mirzaie, Mostafa
    Behkamal, Behshid
    Allahbakhsh, Mohammad
    Paydar, Samad
    Bertino, Elisa
    COMPUTER SCIENCE REVIEW, 2023, 48
  • [36] Quality Assurance Technologies of Big Data Applications: A Systematic Literature Review
    Ji, Shunhui
    Li, Qingqiu
    Cao, Wennan
    Zhang, Pengcheng
    Muccini, Henry
    APPLIED SCIENCES-BASEL, 2020, 10 (22): : 1 - 31
  • [37] Systematic Review of the Literature on Big Data in the Transportation Domain: Concepts and Applications
    Neilson, Alex
    Indratmo
    Daniel, Ben
    Tjandra, Stevanus
    BIG DATA RESEARCH, 2019, 17 : 35 - 44
  • [38] Big data analytics capabilities: a systematic literature review and research agenda
    Patrick Mikalef
    Ilias O. Pappas
    John Krogstie
    Michail Giannakos
    Information Systems and e-Business Management, 2018, 16 : 547 - 578
  • [39] Factors impacting the adoption of big data in healthcare: A systematic literature review
    Al Teneiji, Abeer Saleh
    Abu Salim, Taghreed Yahia
    Riaz, Zainab
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 187
  • [40] Business intelligence and big data in hospitality and tourism: a systematic literature review
    Mariani, Marcello
    Baggio, Rodolfo
    Fuchs, Matthias
    Hoeepken, Wolfram
    INTERNATIONAL JOURNAL OF CONTEMPORARY HOSPITALITY MANAGEMENT, 2018, 30 (12) : 3514 - 3554