Big Data Quality Assessment Model for Unstructured Data

被引:0
|
作者
Taleb, Ikbal [1 ]
Serhani, Mohamed Adel [2 ]
Dssouli, Rachida [1 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; Unstructured Data; Quality of Unstructured Big Data;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
引用
收藏
页码:69 / 74
页数:6
相关论文
共 50 条
  • [41] A process assessment model for big data analytics
    Gokalp, Mert Onuralp
    Gokalp, Ebru
    Kayabay, Kerem
    Gokalp, Selin
    Kocyigit, Altan
    Eren, P. Erhan
    [J]. COMPUTER STANDARDS & INTERFACES, 2022, 80
  • [42] A tetrahedral data model for unstructured data management
    Li Wei
    Lang Bo
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2010, 53 (08) : 1497 - 1510
  • [43] A tetrahedral data model for unstructured data management
    Wei Li
    Bo Lang
    [J]. Science China Information Sciences, 2010, 53 : 1497 - 1510
  • [44] A tetrahedral data model for unstructured data management
    LI Wei & LANG Bo State Key Laboratory of Software Development Environment
    [J]. Science China(Information Sciences), 2010, 53 (08) : 1497 - 1510
  • [45] MEDICAL BIG DATA AND BIG DATA QUALITY PROBLEMS
    Hoffman, Sharona
    [J]. CONNECTICUT INSURANCE LAW JOURNAL, 2014, 21 (01): : 289 - 316
  • [46] Big Data Market Optimization Pricing Model Based on Data Quality
    Yang, Jian
    Zhao, Chongchong
    Xing, Chunxiao
    [J]. COMPLEXITY, 2019, 2019
  • [47] Extensible Query Framework for Unstructured Medical Data - A Big Data Approach
    Istephan, Sarmad
    Siadat, Mohammad-Reza
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 455 - 462
  • [48] Political Science and Big Data: Structured Data, Unstructured Data, and How to Use Them
    Grossman, Jonathan
    Pedahzur, Ami
    [J]. POLITICAL SCIENCE QUARTERLY, 2020, 135 (02) : 225 - 257
  • [49] Data Quality Assessment for System Identification in the Age of Big Data and Industry 4.0
    Shardt, Yuri A. W.
    Yang, Xu
    Brooks, Kevin
    Torgashov, Andrei
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 111 - 120
  • [50] On the Unstructured Big Data Analytical Methods in Firms: Conceptual Model, Measurement, and Perception
    Tarka, Piotr
    Jedrych, Elzbieta
    [J]. BIG DATA, 2020, 8 (06) : 478 - 500