Minimizing detail data in data warehouses

被引:0
|
作者
Akinde, MO [1 ]
Jensen, OG [1 ]
Böhlen, MH [1 ]
机构
[1] Univ Aalborg, Dept Comp Sci, DK-9220 Aalborg Ost, Denmark
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data warehouses collect and maintain large amounts of data from several distributed and heterogeneous data sources. Because of security reasons, operational requirements, and technical feasibility it is often impassible for data warehouses to access the data sources directly. Instead data warehouses have to replicate legacy information as detail data in order to be able to maintain their summary data. In this paper we investigate how to minimize the amount of detail data stored in a data warehouse. More specifically, we identify the minimal amount of data that has to be replicated in order to maintain, either incrementally or by recomputation, summary data defined in terms of generalized project-select-join (GPSJ) views. We show how to minimize the number of tuples and attributes in the current detail tables and even aggregate them where possible. The amount of data to be stored in current detail tables is minimized by exploiting smart duplicate compression in addition to local and join reductions. We identify situations where it becomes possible to omit the typically huge fact table and prove that these techniques in concert ensure that the current detail data is minimal in the sense that no subset of it permits to accurately maintain the same summary data. Finally, we sketch how existing maintenance methods can be adapted to use the minimal detail tables we propose.
引用
收藏
页码:293 / 307
页数:15
相关论文
共 50 条
  • [41] Summarizing distributed data streams for storage in data warehouses
    Chiky, Raja
    Hebrail, Georges
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 65 - 74
  • [42] AGENT BASED DATA STORAGE AND DISTRIBUTION IN DATA WAREHOUSES
    Kolsi, Nader
    Abdellatif, Abdelaziz
    Ghedira, Khaled
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2008, 18 (05) : 597 - 617
  • [43] DATA QUALITY ASSESMENT IN DATA WAREHOUSES AND ANALYTIC TOOLS
    Andreescu, Anca
    Diaconita, Vlad
    Florea, Alexandra
    Velicanu, Anda
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2013, : 371 - 376
  • [44] Data Warehouses in Bioinformatics: Integration of Molecular Biological Data
    Kormeier, Benjamin
    Hippe, Klaus
    Hofestaedt, Ralf
    [J]. IT-INFORMATION TECHNOLOGY, 2011, 53 (05): : 241 - 248
  • [45] Resumption of data extraction process in parallel data warehouses
    Gorawski, Marcin
    Marks, Pawel
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2006, 3911 : 478 - 485
  • [46] Denormalization strategies for data retrieval from data warehouses
    Shin, Seung Kyoon
    Sanders, G. Lawrence
    [J]. DECISION SUPPORT SYSTEMS, 2006, 42 (01) : 267 - 282
  • [47] An Analytical Model for Data Persistence in Business Data Warehouses
    Koeppen, Veit
    Winsemann, Thorsten
    Saake, Gunter
    [J]. 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2015, : 351 - 362
  • [48] Intentional Data Placement Optimization for Distributed Data Warehouses
    Arres, Billel
    Kabachi, Nadia
    Boussaid, Omar
    Bentayeb, Fadila
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 80 - 86
  • [49] Data classification and management in very large data warehouses
    Chelluri, K
    Kumar, V
    [J]. THIRD INTERNATIONAL WORKSHOP ON ADVANCED ISSUES OF E-COMMERCE AND WEB-BASED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 52 - 57
  • [50] Document-oriented Models for Data Warehouses NoSQL Document-oriented for Data Warehouses
    Chevalier, Max
    El Malki, Mohammed
    Kopliku, Arlind
    Teste, Olivier
    Tournier, Ronan
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 142 - 149