Big Data Quality: A Quality Dimensions Evaluation

被引:0
|
作者
Taleb, Ikbal [1 ]
El Kassabi, Hadeel T. [1 ]
Serhani, Mohamed Adel [2 ]
Dssouli, Rachida [1 ]
Bouhaddioui, Chafik [3 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
[3] UAE Univ, Coll Business & Econ, Al Ain, U Arab Emirates
关键词
Big Data; data quality dimensions; data quality evaluation; Big data sampling;
D O I
10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.145
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable and can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging and must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size and its fast generation, it requires mechanisms and strategies to evaluate and assess data quality in a fast and efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness and consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data and illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
引用
收藏
页码:759 / 765
页数:7
相关论文
共 50 条
  • [31] A Crowdsourcing Worker Quality Evaluation Algorithm on MapReduce for Big Data Applications
    Dang, Depeng
    Liu, Ying
    Zhang, Xiaoran
    Huang, Shihang
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (07) : 1879 - 1888
  • [32] Research on Ecoenvironmental Quality Evaluation System Based on Big Data Analysis
    Li, Pingheng
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [33] Information Governance, Big Data and Data Quality
    de Freitas, Patricia Alves
    dos Reis, Everson Andrade
    Michel, Wanderson Senra
    Gronovicz, Mauro Edson
    de Macedo Rodrigues, Marcio Alexandre
    [J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1142 - 1143
  • [34] Data Quality: The other Face of Big Data
    Saha, Barna
    Srivastava, Divesh
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 1294 - 1297
  • [35] Data Quality Management for Big Data Applications
    Khaleel, Majida Yaseen
    Hamad, Murtadha M.
    [J]. 12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 357 - 362
  • [36] A Data Quality in Use Model for Big Data
    Caballero, Ismael
    Serrano, Manuel
    Piattini, Mario
    [J]. ADVANCES IN CONCEPTUAL MODELING, 2014, 8823 : 65 - 74
  • [37] Data Quality Issues in Big Data: A Review
    Salih, Fathi Ibrahim
    Ismail, Saiful Adli
    Hamed, Mosaab M.
    Yusop, Othman Mohd
    Azmi, Azri
    Azmi, Nurulhuda Firdaus Mohd
    [J]. RECENT TRENDS IN DATA SCIENCE AND SOFT COMPUTING, IRICT 2018, 2019, 843 : 105 - 116
  • [38] Big Data Quality: a Roadmap for Open Data
    Ciancarini, Paolo
    Poggi, Francesco
    Russo, Daniel
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 210 - 215
  • [39] A Data Quality in Use model for Big Data
    Merino, Jorge
    Caballero, Ismael
    Rivas, Bibiano
    Serrano, Manuel
    Piattini, Mario
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 63 : 123 - 130
  • [40] Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling
    Wook, Muslihah
    Hasbullah, Nor Asiakin
    Zainudin, Norulzahrah Mohd
    Jabar, Zam Zarina Abdul
    Ramli, Suzaimah
    Razali, Noor Afiza Mat
    Yusop, Nurhafizah Moziyana Mohd
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)