Optimising data quality of a data warehouse using data purgation process

被引：0

作者：

Gupta, Neha ^{[1
]}

机构：

[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India

来源：

INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT | 2023年 / 15卷 / 01期

关键词：

data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;

D O I：

10.1504/IJDMMM.2023.129961

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.

引用

下载

页码：102 / 131

页数：31

共 50 条

[31] Data warehouse quality and agent technology
Jarke, M
COOPERATIVE INFORMATION AGENTS V, PROCEEDINGS, 2001, 2182 : 56 - 75
[32] Statistical quality control of warehouse data
Hinrichs, H
DATABASES AND INFORMATION SYSTEMS, 2001, : 69 - 84
[33] Enhanced extraction clinical data technique to improve data quality in clinical data warehouse
Mohammed, AbubakerElrazi O.
Talab, Samani A.
International Journal of Database Theory and Application, 2015, 8 (03): : 333 - 342
[34] Is your Data Warehouse successful? Developing a Data Warehouse process that responds to the needs of the enterprise.
Welbrock, PR
PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SAS USERS GROUP INTERNATIONAL CONFERENCE, 1998, : 574 - 583
[35] Implementation of Change Data Capture in ETL Process for Data Warehouse Using HDFS and Apache Spark
Denny
Atmaja, I. Pulu Medagia
Saptawijaya, Ali
Aminah, Siti
2017 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2017), 2017, : 49 - 55
[36] The data warehouse and data mining
Inmon, WH
COMMUNICATIONS OF THE ACM, 1996, 39 (11) : 49 - 50
[37] Data Warehouse and Data Virtualization
Mousa, Ayad Hameed
Shiratuddin, Norshuhada
PROCEEDINGS 2015 INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING DESE 2015, 2015, : 369 - 372
[38] Using portfolio theory for automatically processing information about data quality in data warehouse environments
Bruckner, RM
Schiefer, J
ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2000, 1909 : 34 - 43
[39] Optimising and Predicting Performance of Industrial Filtrations using Process Data
Bahner, Franz D.
Santacoloma, Paloma A.
Abildskov, Jens
Huusom, Jakob K.
27TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PT B, 2017, 40B : 1471 - 1476
[40] Comparing HiveQL and MapReduce Methods to Process Fact Data in a Data Warehouse
Pen, Haince Denis
Dsilva, Prajyoti
Mascarnes, Sweedle
2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 201 - 206

← 1 2 3 4 5 →