Big Data Pre-Processing: A Quality Framework

被引:65
|
作者
Taleb, Ikbal [1 ]
Dssouli, Rachida [1 ]
Serhani, Mohamed Adel [2 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; pre-processing; DATA PROVENANCE; CHALLENGES; MANAGEMENT; ANALYTICS;
D O I
10.1109/BigDataCongress.2015.35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 50 条
  • [41] Application of pre-processing of NIRS modeling data
    Wang Zhihong
    Lin Jun
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 3, 2006, : 295 - 298
  • [42] Parallel Pre-processing of Affymetrix Microarray Data
    Guzzi, Pietro Hiram
    Cannataro, Mario
    EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 225 - 232
  • [43] SumatraTT:: a generic data pre-processing system
    Aubrecht, P
    Miksovsky, P
    Král, L
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 120 - 124
  • [44] A study on data pre-processing in reverse engineering
    Liu Deping
    Shangguan Jianlin
    Chen Jianjun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MECHANICAL TRANSMISSIONS, VOLS 1 AND 2, 2006, : 1428 - 1432
  • [45] NanoStringNormCNV: pre-processing of NanoString CNV data
    Sendorek, Dorota H.
    Lalonde, Emilie
    Yao, Cindy Q.
    Sabelnykova, Veronica Y.
    Bristow, Robert G.
    Boutros, Paul C.
    BIOINFORMATICS, 2018, 34 (06) : 1034 - 1036
  • [46] A Pre-processing framework for spectral classification of hyperspectral images
    Singh, Simranjit
    Kasana, Singara Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (01) : 243 - 261
  • [47] INTEGRATING PRE-PROCESSING PIPELINES IN ODC BASED FRAMEWORK
    Otamendi, U.
    Azpiroz, I.
    Quartulli, M.
    Olaizola, I.
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 4094 - 4097
  • [48] Data pre-processing for obstacle in automotive applications
    Wahl, M
    Georges, D
    Dang, M
    IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, 1997, : 409 - 414
  • [49] An intelligent data pre-processing of complex datasets
    Abdul-Rahman, Shuzlina
    Abu Bakar, Azuraliza
    Mohamed-Hussein, Zeti-Azura
    INTELLIGENT DATA ANALYSIS, 2012, 16 (02) : 305 - 325
  • [50] Pre-processing of RDF data for METIS partitioning
    Benhamed S.
    Nait-Bahloul S.
    International Journal of Metadata, Semantics and Ontologies, 2023, 16 (02) : 152 - 171