Minimizing detail data in data warehouses

被引:0
|
作者
Akinde, MO [1 ]
Jensen, OG [1 ]
Böhlen, MH [1 ]
机构
[1] Univ Aalborg, Dept Comp Sci, DK-9220 Aalborg Ost, Denmark
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data warehouses collect and maintain large amounts of data from several distributed and heterogeneous data sources. Because of security reasons, operational requirements, and technical feasibility it is often impassible for data warehouses to access the data sources directly. Instead data warehouses have to replicate legacy information as detail data in order to be able to maintain their summary data. In this paper we investigate how to minimize the amount of detail data stored in a data warehouse. More specifically, we identify the minimal amount of data that has to be replicated in order to maintain, either incrementally or by recomputation, summary data defined in terms of generalized project-select-join (GPSJ) views. We show how to minimize the number of tuples and attributes in the current detail tables and even aggregate them where possible. The amount of data to be stored in current detail tables is minimized by exploiting smart duplicate compression in addition to local and join reductions. We identify situations where it becomes possible to omit the typically huge fact table and prove that these techniques in concert ensure that the current detail data is minimal in the sense that no subset of it permits to accurately maintain the same summary data. Finally, we sketch how existing maintenance methods can be adapted to use the minimal detail tables we propose.
引用
收藏
页码:293 / 307
页数:15
相关论文
共 50 条
  • [1] Assessing the quality of data in data warehouses
    不详
    [J]. HAZARDOUS WASTE CONSULTANT, 2002, 20 (03) : A13 - A15
  • [2] Augmenting Data Warehouses with Big Data
    Jukic, Nenad
    Sharma, Abhishek
    Nestorov, Svetlozar
    Jukic, Boris
    [J]. INFORMATION SYSTEMS MANAGEMENT, 2015, 32 (03) : 200 - 209
  • [3] Identifying data sources for data warehouses
    Koncilia, C
    Pozewaunig, H
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 213 - 218
  • [4] Populating Data Warehouses with Semantic Data
    Nebot, V.
    Berlanga, R.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2010, 8 (02) : 150 - 157
  • [5] Querying Compressed Data in Data Warehouses
    Anindya Datta
    Helen Thomas
    [J]. Information Technology and Management, 2002, 3 (4) : 353 - 386
  • [6] A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES
    Peng, Taoxin
    [J]. ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL DISI: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2008, : 473 - 478
  • [7] Designing data marts for data warehouses
    Bonifati, A
    Cattaneo, F
    Ceri, S
    Fuggetta, A
    Paraboschi, S
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2001, 10 (04) : 452 - 483
  • [8] Data mining and data warehouses - An overview
    Gray, P
    [J]. ASSOCIATION FOR INFORMATION SYSTEMS PROCEEDING OF THE AMERICAS CONFERENCE ON INFORMATION SYSTEMS, 1997, : 857 - 859
  • [9] DATA ANALYTICAL PROCESSING IN DATA WAREHOUSES
    Rostek, Katarzyna
    [J]. FOUNDATIONS OF MANAGEMENT, 2010, 2 (01) : 99 - 116
  • [10] A Data Masking Technique for Data Warehouses
    Santos, Ricardo Jorge
    Bernardino, Jorge
    Vieira, Marco
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '11), 2011, : 61 - 69