A Review on Data Cleansing Methods for Big Data

被引:64
|
作者
Ridzuan, Fakhitah [1 ]
Zainon, Wan Mohd Nazmee Wan [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia
关键词
data cleansing; big data; data quality;
D O I
10.1016/j.procs.2019.11.177
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Massive amounts of data are available for the organization which will influence their business decision. Data collected from the various resources are dirty and this will affect the accuracy of prediction result. Data cleansing offers a better data quality which will be a great help for the organization to make sure their data is ready for the analyzing phase. However, the amount of data collected by the organizations has been increasing every year, which is making most of the existing methods no longer suitable for big data. Data cleansing process mainly consists of identifying the errors, detecting the errors and corrects them. Despite the data need to be analyzed quickly, the data cleansing process is complex and time-consuming in order to make sure the cleansed data have a better quality of data. The importance of domain expert in data cleansing process is undeniable as verification and validation are the main concerns on the cleansed data. This paper reviews the data cleansing process, the challenge of data cleansing for big data and the available data cleansing methods. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:731 / 738
页数:8
相关论文
共 50 条
  • [1] BigDansing: A System for Big Data Cleansing
    Khayyat, Zuhair
    Ilyas, Ihab F.
    Jindal, Alekh
    Madden, Samuel
    Ouzzani, Mourad
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Tang, Nan
    Yin, Si
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1215 - 1230
  • [2] Analytical Review of Data Visualization Methods in Application to Big Data
    Gorodov, Evgeniy Yur'evich
    Gubarev, Vasiliy Vasil'evich
    [J]. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2013, 2013
  • [3] Big Data: A Review of Analytics Methods & Techniques
    Arora, Yojna
    Goyal, Dinesh
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 225 - 230
  • [4] Data cleansing mechanisms and approaches for big data analytics: a systematic study
    Hosseinzadeh, Mehdi
    Azhir, Elham
    Ahmed, Omed Hassan
    Ghafour, Marwan Yassin
    Ahmed, Sarkar Hasan
    Rahmani, Amir Masoud
    Vo, Bay
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (1) : 99 - 111
  • [5] Data cleansing mechanisms and approaches for big data analytics: a systematic study
    Mehdi Hosseinzadeh
    Elham Azhir
    Omed Hassan Ahmed
    Marwan Yassin Ghafour
    Sarkar Hasan Ahmed
    Amir Masoud Rahmani
    Bay Vo
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 99 - 111
  • [6] Customized Eager-Lazy Data Cleansing for Satisfactory Big Data Veracity
    Moussa, Rim
    Sahri, Soror
    [J]. IDEAS 2021: 25TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, 2021, : 157 - 165
  • [7] Big data in public transportation: a review of sources and methods
    Welch, Timothy F.
    Widita, Alyas
    [J]. TRANSPORT REVIEWS, 2019, 39 (06) : 795 - 818
  • [8] The Review of Big Data
    Shi, Chunhe
    Wu, Chengdong
    Han, Xiaowei
    Li, Zhen
    Xie, Yinghong
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 108 - 112
  • [9] Big Data: A Review
    Sagiroglu, Seref
    Sinanc, Duygu
    [J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 42 - 47
  • [10] A Detailed Review on the Prominent Compression Methods Used for Reducing the Data Volume of Big Data
    Anuradha D.
    Bhuvaneswari S.
    [J]. Annals of Data Science, 2016, 3 (01) : 47 - 62