Big Data Quality Assessment Model for Unstructured Data

被引:0
|
作者
Taleb, Ikbal [1 ]
Serhani, Mohamed Adel [2 ]
Dssouli, Rachida [1 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; Unstructured Data; Quality of Unstructured Big Data;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
引用
收藏
页码:69 / 74
页数:6
相关论文
共 50 条
  • [1] Quality Assessment : Big Data
    Shadrin, A.
    Afonichkina, E.
    [J]. EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT THROUGH VISION 2020, 2019, : 8865 - 8869
  • [2] Big Data Quality: A Data Quality Profiling Model
    Taleb, Ikbal
    Serhani, Mohamed Adel
    Dssouli, Rachida
    [J]. SERVICES - SERVICES 2019, 2019, 11517 : 61 - 77
  • [3] Usability enhancement model for unstructured text in big data
    Kiran Adnan
    Rehan Akbar
    Khor Siak Wang
    [J]. Journal of Big Data, 10
  • [4] Usability enhancement model for unstructured text in big data
    Adnan, Kiran
    Akbar, Rehan
    Wang, Khor Siak
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [5] A Data Quality in Use Model for Big Data
    Caballero, Ismael
    Serrano, Manuel
    Piattini, Mario
    [J]. ADVANCES IN CONCEPTUAL MODELING, 2014, 8823 : 65 - 74
  • [6] Unstructured Data Treatment for Big Data Solutions
    Sato, Shintaro
    Kayahara, Akihiro
    Imai, Shin-ichi
    [J]. INTERNATIONAL SYMPOSIUM ON SEMICONDUCTOR MANUFACTURING (ISSM) 2016 PROCEEDINGS OF TECHNICAL PAPERS, 2016,
  • [7] A Data Quality in Use model for Big Data
    Merino, Jorge
    Caballero, Ismael
    Rivas, Bibiano
    Serrano, Manuel
    Piattini, Mario
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 63 : 123 - 130
  • [8] ExNav: An Interactive Big Data hxploration Framework for Big Unstructured Data
    Ge, Xiaoyu
    Zhang, Xiaozhong
    Chrysanthis, Panos K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 503 - 512
  • [9] A Model for Enhancing Unstructured Big Data Warehouse Execution Time
    Farhan, Marwa Salah
    Youssef, Amira
    Abdelhamid, Laila
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (02)
  • [10] Unravelling Unstructured Data: A Wealth of Information in Big Data
    Tanwar, Mona
    Duggal, Reena
    Khatri, Sunil Kumar
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,