Optimising data quality of a data warehouse using data purgation process

被引:0
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
下载
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [31] Data warehouse quality and agent technology
    Jarke, M
    COOPERATIVE INFORMATION AGENTS V, PROCEEDINGS, 2001, 2182 : 56 - 75
  • [32] Statistical quality control of warehouse data
    Hinrichs, H
    DATABASES AND INFORMATION SYSTEMS, 2001, : 69 - 84
  • [33] Enhanced extraction clinical data technique to improve data quality in clinical data warehouse
    Mohammed, AbubakerElrazi O.
    Talab, Samani A.
    International Journal of Database Theory and Application, 2015, 8 (03): : 333 - 342
  • [34] Is your Data Warehouse successful? Developing a Data Warehouse process that responds to the needs of the enterprise.
    Welbrock, PR
    PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SAS USERS GROUP INTERNATIONAL CONFERENCE, 1998, : 574 - 583
  • [35] Implementation of Change Data Capture in ETL Process for Data Warehouse Using HDFS and Apache Spark
    Denny
    Atmaja, I. Pulu Medagia
    Saptawijaya, Ali
    Aminah, Siti
    2017 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2017), 2017, : 49 - 55
  • [36] The data warehouse and data mining
    Inmon, WH
    COMMUNICATIONS OF THE ACM, 1996, 39 (11) : 49 - 50
  • [37] Data Warehouse and Data Virtualization
    Mousa, Ayad Hameed
    Shiratuddin, Norshuhada
    PROCEEDINGS 2015 INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING DESE 2015, 2015, : 369 - 372
  • [38] Using portfolio theory for automatically processing information about data quality in data warehouse environments
    Bruckner, RM
    Schiefer, J
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2000, 1909 : 34 - 43
  • [39] Optimising and Predicting Performance of Industrial Filtrations using Process Data
    Bahner, Franz D.
    Santacoloma, Paloma A.
    Abildskov, Jens
    Huusom, Jakob K.
    27TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PT B, 2017, 40B : 1471 - 1476
  • [40] Comparing HiveQL and MapReduce Methods to Process Fact Data in a Data Warehouse
    Pen, Haince Denis
    Dsilva, Prajyoti
    Mascarnes, Sweedle
    2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 201 - 206