Optimising data quality of a data warehouse using data purgation process

被引:0
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
下载
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [1] Data Quality in Data Warehouse Systems
    Serra, Flavia
    Marotta, Adriana
    PROCEEDINGS OF THE 2016 XLII LATIN AMERICAN COMPUTING CONFERENCE (CLEI), 2016,
  • [2] Data Warehouse and Data Quality - An Overview
    Brajkovic, Helena
    Jaksic, Danijela
    Poscic, Patrizia
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2020), 2020, : 17 - 24
  • [3] Understanding data quality in a data warehouse
    Shanks, G
    Darke, P
    AUSTRALIAN COMPUTER JOURNAL, 1998, 30 (04): : 122 - 128
  • [4] Research on Data Quality of Data Warehouse
    Liu Shuanghong
    Han Zhongjun
    EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 5255 - 5258
  • [5] Improving OLTP data quality using data warehouse mechanisms
    Jarke, M
    Quix, C
    Blees, G
    Lehmann, D
    Michalk, G
    Stierl, S
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 536 - 537
  • [6] Data warehouse quality
    Strategic Syst, 1 (07):
  • [7] Enhancing data quality in data warehouse environments
    Ballou, DP
    Tayi, GK
    COMMUNICATIONS OF THE ACM, 1999, 42 (01) : 73 - 78
  • [8] Cacophonic contributions to data quality in the data warehouse
    Rasmussen, Karsten Boye
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 7, 2005, : 311 - 316
  • [9] Data Warehouse Quality Assessment Using Contexts
    Serra, Flavia
    Marotta, Adriana
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT II, 2016, 10042 : 436 - 448
  • [10] Data warehouse process management
    Vassiliadis, P
    Quix, C
    Vassiliou, Y
    Jarke, M
    INFORMATION SYSTEMS, 2001, 26 (03) : 205 - 236