Big Data Quality: A Quality Dimensions Evaluation

被引:0
|
作者
Taleb, Ikbal [1 ]
El Kassabi, Hadeel T. [1 ]
Serhani, Mohamed Adel [2 ]
Dssouli, Rachida [1 ]
Bouhaddioui, Chafik [3 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
[3] UAE Univ, Coll Business & Econ, Al Ain, U Arab Emirates
关键词
Big Data; data quality dimensions; data quality evaluation; Big data sampling;
D O I
10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.145
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable and can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging and must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size and its fast generation, it requires mechanisms and strategies to evaluate and assess data quality in a fast and efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness and consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data and illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
引用
收藏
页码:759 / 765
页数:7
相关论文
共 50 条
  • [1] Big Data and Data Quality Dimensions
    Rambli, Yanty Rahayu
    Shahibi, Mohd Sazili
    Ibrahim, Zaharudin
    Ismail, Mohd Nasir
    [J]. INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE THROUGH VISION 2020, VOLS I -XI, 2018, : 6959 - 6964
  • [2] Quality Evaluation for Documental Big Data
    Fugini, Mariagrazia
    Finocchi, Jacopo
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2020, : 132 - 139
  • [3] Laboratory Data Quality Evaluation in the Big Data Era
    Kim, Sollip
    [J]. ANNALS OF LABORATORY MEDICINE, 2023, 43 (05) : 399 - 400
  • [4] From Data Quality to Big Data Quality
    Batini, Carlo
    Rula, Anisa
    Scannapieco, Monica
    Viscusi, Gianluigi
    [J]. JOURNAL OF DATABASE MANAGEMENT, 2015, 26 (01) : 60 - 82
  • [5] BIG DATA, BIG DATA QUALITY PROBLEM
    Becker, David
    McMullen, Bill
    King, Trish Dunn
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2644 - 2653
  • [6] Big Data Quality: A Data Quality Profiling Model
    Taleb, Ikbal
    Serhani, Mohamed Adel
    Dssouli, Rachida
    [J]. SERVICES - SERVICES 2019, 2019, 11517 : 61 - 77
  • [7] Data Governance in the Health Industry: Investigating Data Quality Dimensions within a Big Data Context
    Juddoo, Suraj
    George, Carlisle
    Duquenoy, Penny
    Windridge, David
    [J]. APPLIED SYSTEM INNOVATION, 2018, 1 (04) : 1 - 16
  • [8] Research on Comprehensive Evaluation of Data Source Quality in Big Data Environment
    Li, Wenquan
    Xu, Suping
    Peng, Xindong
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1831 - 1841
  • [9] Data and Process Quality Evaluation in a Textual Big Data Archiving System
    Fugini, Mariagrazia
    Finocchi, Jacopo
    [J]. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2022, 15 (01):
  • [10] MEDICAL BIG DATA AND BIG DATA QUALITY PROBLEMS
    Hoffman, Sharona
    [J]. CONNECTICUT INSURANCE LAW JOURNAL, 2014, 21 (01): : 289 - 316