Cleaning Big Data Streams: A Systematic Literature Review

被引:3
|
作者
Alotaibi, Obaid [1 ,2 ]
Pardede, Eric [2 ]
Tomy, Sarath [3 ]
Bagui, Sikha
Iacono, Mauro
机构
[1] Shaqra Univ, Coll Arts & Sci, Dept Comp Sci, Sajir Campus, Sajir City 11951, Saudi Arabia
[2] La Trobe Univ, Sch Engn & Math Sci, Dept Comp Sci & Informat Technol, Melbourne Campus, Melbourne, Vic 3086, Australia
[3] La Trobe Univ, Sch Engn & Math Sci, Dept Comp Sci & Informat Technol, Bendigo Campus, Flora Hill, Vic 3552, Australia
关键词
clean; big data; stream; machine learning; deep learning; artificial intelligence; missing value; outliers; duplicate data; irrelevant data; OUTLIER DETECTION; ANOMALY DETECTION; FRAMEWORK;
D O I
10.3390/technologies11040101
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In today's big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] High-Speed Big Data Streams: A Literature Review
    Sneha, R. Patil
    Nagaraj, V. Dharwadkar
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES, ICCNCT 2019, 2020, 44 : 308 - 316
  • [2] Data cleaning and machine learning: a systematic literature review
    Cote, Pierre-Olivier
    Nikanjam, Amin
    Ahmed, Nafisa
    Humeniuk, Dmytro
    Khomh, Foutse
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (02)
  • [3] Hierarchical classification of data streams: a systematic literature review
    Tieppo, Eduardo
    dos Santos, Roger Robson
    Barddal, Jean Paul
    Nievola, Julio Cesar
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3243 - 3282
  • [4] Hierarchical classification of data streams: a systematic literature review
    Eduardo Tieppo
    Roger Robson dos Santos
    Jean Paul Barddal
    Júlio Cesar Nievola
    [J]. Artificial Intelligence Review, 2022, 55 : 3243 - 3282
  • [5] BIG DATA ARCHITECTURES FOR DATA LAKES: A SYSTEMATIC LITERATURE REVIEW
    Ramchand, Sonam
    Mahmood, Tariq
    [J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1141 - 1146
  • [6] Big data analytics in healthcare: a systematic literature review
    Khanra, Sayantan
    Dhir, Amandeep
    Islam, Najmul
    Mantymaki, Matti
    [J]. ENTERPRISE INFORMATION SYSTEMS, 2020, 14 (07) : 878 - 912
  • [7] 15 years of Big Data: a systematic literature review
    Tosi, Davide
    Kokaj, Redon
    Roccetti, Marco
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [8] Security and Privacy for Big Data: A Systematic Literature Review
    Nelson, Boel
    Olovsson, Tomas
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3693 - 3702
  • [9] A Systematic Literature Review of Big Data and the Hadoop frameworks
    Naidu, Devishree
    Thakur, Adi
    [J]. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (02) : 2969 - 2973
  • [10] Manufacturing big data ecosystem: A systematic literature review
    Cui, Yesheng
    Kara, Sami
    Chan, Ka C.
    [J]. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2020, 62