A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES

被引:0
|
作者
Peng, Taoxin [1 ]
机构
[1] Napier Univ, Sch Comp, Edinburgh EH10 5DT, Midlothian, Scotland
关键词
Data Cleaning; Data Quality; Data Integration; Data Warehousing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
引用
收藏
页码:473 / 478
页数:6
相关论文
共 50 条
  • [41] Business-Object Oriented Requirements Analysis Framework for Data Warehouses
    Sarkar, Anirban
    Choudhury, Sankhayan
    Chaki, Nabendu
    Bhattacharya, Swapan
    22ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING & KNOWLEDGE ENGINEERING (SEKE 2010), 2010, : 34 - 37
  • [42] A Framework for Investigating the Performance of Sum Aggregations over Encrypted Data Warehouses
    Lopes, Claudivan Cruz
    Times, Valeria Cesario
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1000 - 1007
  • [43] FIF: A NLP-based Feature Identification Framework for Data Warehouses
    Chouhan, Ashish
    Prabhune, Ajinkya
    2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, : 276 - 281
  • [44] A game theory based framework for materialized view selection in data warehouses
    Azgomi, Hossein
    Sohrabi, Mohammad Karim
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 71 : 125 - 137
  • [45] The medical data in the knowledge : warehouses and searches of data
    Garcelon, N.
    ANNALES DE DERMATOLOGIE ET DE VENEREOLOGIE, 2015, 142 (12): : S389 - S390
  • [46] Building data warehouses with semantic web data
    Nebot, Victoria
    Berlanga, Rafael
    DECISION SUPPORT SYSTEMS, 2012, 52 (04) : 853 - 868
  • [47] Data Warehouses Federation as a Single Data Warehouse
    Kern, Rafal
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT I, 2016, 9875 : 356 - 366
  • [48] Integrating data warehouses with web data:: A survey
    Manuel Perez, Juan
    Berlanga, Rafael
    Jose Aramburu, Maria
    Pedersen, Torben Bach
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (07) : 940 - 955
  • [49] A data cleaning model for electric power big data based on Spark framework
    Qu, Zhao-Yang
    Wang, Yong-Wen
    Wang, Chong
    Qu, Nan
    Yan, Jia
    International Journal of Database Theory and Application, 2016, 9 (03): : 137 - 150
  • [50] A Framework for Exploration and Cleaning of Environmental Data - Tehran Air Quality Data Experience
    Shamsipour, Mansour
    Farzadfar, Farshad
    Gohari, Kimiya
    Parsaeian, Mahboubeh
    Amini, Hassan
    Rabiei, Katayoun
    Hassanvand, Mohammad Sadegh
    Navidi, Iman
    Fotouhi, Akbar
    Naddafi, Kazem
    Sarrafzadegan, Nizal
    Mansouri, Anita
    Mesdaghinia, Alireza
    Larijani, Bagher
    Yunesian, Masud
    ARCHIVES OF IRANIAN MEDICINE, 2014, 17 (12) : 821 - 829