Data Vault Mappings to Dimensional Model Using Schema Matching

被引:0
|
作者
Puonti, Mikko [1 ,2 ]
Raitalaakso, Timo [1 ,2 ]
机构
[1] Solita Ltd, Akerlundinkatu 11, Tampere 33100, Finland
[2] Tampere Univ, Kalevantie 4, Tampere 33100, Finland
关键词
Schema matching; Data flow; Data warehouse; Data vault; Dimensional model;
D O I
10.1007/978-3-030-37632-1_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In data warehousing, business driven development defines data requirements to fulfill reporting needs. A data warehouse stores current and historical data in one single place. Data warehouse architecture consists of several layers and each has its own purpose. A staging layer is a data storage area to assists data loadings, a data vault modelled layer is the persistent storage that integrates data and stores the history, whereas publish layer presents data using a vocabulary that is familiar to the information users. By following the process which is driven by business requirements and starts with publish layer structure, this creates a situation where manual work requires a specialist, who knows the data vault model. Our goal is to reduce the number of entities that can be selected in a transformation so that the individual developer does not need to know the whole solution, but can focus on a subset of entities (partial schema). In this paper, we present two different schema matchers, one based on attribute names, and another based on data flow mapping information. Schema matching based on data flow mappings is a novel addition to current schema matching literature. Through the example of Northwind, we show how these two different matchers affect the formation of a partial schema for transformation source entities. Based on our experiment with Northwind we conclude that combining schema matching algorithms produces correct entities in the partial schema.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 50 条
  • [1] Specification of Data Schema Mappings using Weaving Models
    Anicic, Nenad
    Neskovic, Sinisa
    Vuckovic, Milica
    Cvetkovic, Radovan
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2012, 9 (02) : 539 - 559
  • [2] Schema Mappings for Data Graphs
    Francis, Nadime
    Libkin, Leonid
    [J]. PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 389 - 401
  • [3] Managing uncertainty in schema matching with top-K schema mappings
    Gal, Avigdor
    [J]. JOURNAL ON DATA SEMANTICS VI, 2006, 4090 : 90 - 114
  • [4] SMM: An Effective Schema Matching Model for Data Grids
    Shen, Derong
    Yu, Enyun
    Wang, Zhenhua
    Kou, Yue
    Nie, Tiezheng
    Yu, Ge
    [J]. HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 613 - 618
  • [5] Schema Mappings: Rules for Mixing Data
    Halevy, Alon
    [J]. COMMUNICATIONS OF THE ACM, 2010, 53 (01) : 100 - 100
  • [6] Large Database Schema Matching using Data Mining Techniques
    Reis, Debora G.
    Ladeira, Marcelo
    Holanda, Maristela
    Victorino, Marcio C.
    [J]. 2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 523 - 530
  • [7] Executable schema mappings for statistical data processing
    Atzeni, Paolo
    Bellomarini, Luigi
    Bugiotti, Francesca
    De Leonardis, Marco
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2018, 36 (02) : 265 - 300
  • [8] Characterizing Schema Mappings via Data Examples
    Alexe, Bogdan
    Kolaitis, Phokion G.
    Tan, Wang-Chiew
    [J]. PODS 2010: PROCEEDINGS OF THE TWENTY-NINTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2010, : 261 - 271
  • [9] Executable schema mappings for statistical data processing
    Paolo Atzeni
    Luigi Bellomarini
    Francesca Bugiotti
    Marco De Leonardis
    [J]. Distributed and Parallel Databases, 2018, 36 : 265 - 300
  • [10] Characterizing Schema Mappings via Data Examples
    Alexe, Bogdan
    Ten Cate, Balder
    Kolaitis, Phokion G.
    Tan, Wang-Chiew
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (04):