Uncovering the Evolution History of Data Lakes

被引:0
|
作者
Klettke, Meike [1 ]
Awolin, Hannes [1 ]
Stoerl, Uta [2 ]
Mueller, Daniel [2 ]
Scherzinger, Stefanie [3 ]
机构
[1] Univ Rostock, Rostock, Germany
[2] Univ Appl Sci, Darmstadt, Germany
[3] OTH Regensburg, Regensburg, Germany
关键词
NoSQL databases; schema version extraction; evolution operations; integrity constraints; inclusion dependencies;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data accumulating in data lakes can become inaccessible in the long run when its semantics are not available. The heterogeneity of data formats and the sheer volumes of data collections prohibit cleaning and unifying the data manually. Thus, tools for automated data lake analysis are of great interest. In this paper, we target the particular problem of reconstructing the schema evolution history from data lakes. Knowing how the data is structured, and how this structure has evolved over time, enables programmatic access to the lake. By deriving a sequence of schema versions, rather than a single schema, we take into account structural changes over time. Moreover, we address the challenge of detecting inclusion dependencies. This is a prerequisite for mapping between succeeding schema versions, and in particular, detecting nontrivial changes such as a property having been moved or copied. We evaluate our approach for detecting inclusion dependencies using the MovieLens dataset, as well an adaption of a dataset containing botanical descriptions, to cover specific edge cases.
引用
收藏
页码:2462 / 2471
页数:10
相关论文
共 50 条
  • [1] ASPECTS OF THE HISTORY AND EVOLUTION OF ALPINE LAKES IN AUSTRIA
    LOFFLER, H
    [J]. HYDROBIOLOGIA, 1983, 100 : 143 - 152
  • [2] Uncovering the dispersion history, adaptive evolution and selection of wheat in China
    Zhou, Yong
    Chen, Zhongxu
    Cheng, Mengping
    Chen, Jian
    Zhu, Tingting
    Wang, Rui
    Liu, Yaxi
    Qi, Pengfei
    Chen, Guoyue
    Jiang, Qiantao
    Wei, Yuming
    Luo, Ming-Cheng
    Nevo, Eviatar
    Allaby, Robin G.
    Liu, Dengcai
    Wang, Jirui
    Dvorak, Jan
    Zheng, Youliang
    [J]. PLANT BIOTECHNOLOGY JOURNAL, 2018, 16 (01) : 280 - 291
  • [3] Federated data storage evolution in HENP: data lakes and beyond
    Zarochentsev, Andrey
    Espinal, Xavier
    Kiryanov, Andrey
    Schovancova, Jaroslava
    [J]. 19TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2020, 1525
  • [4] Uncovering the Hidden Co-evolution in the Work History of Software Projects
    Bala, Saimir
    Revoredo, Kate
    Goncalves, Joao Carlos de A. R.
    Baiao, Fernanda
    Mendling, Jan
    Santoro, Flavia
    [J]. BUSINESS PROCESS MANAGEMENT, BPM 2017, 2017, 10445 : 164 - 180
  • [5] UNCOVERING THE HISTORY OF THE EARTH
    GUNTHER, FJ
    [J]. MICROCOMPUTING, 1982, 6 (01): : 60 - &
  • [6] Phylogeny and Evolution of the Neotropical Radiation of Lachemilla (Rosaceae): Uncovering a History of Reticulate Evolution and Implications for Infrageneric Classification
    Morales-Briones, Diego F.
    Romoleroux, Katya
    Kolar, Filip
    Tank, David C.
    [J]. SYSTEMATIC BOTANY, 2018, 43 (01) : 17 - 34
  • [7] Uncovering Colonial History
    Weible, David Robert
    [J]. PRESERVATION, 2014, 66 (01): : 11 - 11
  • [8] Uncovering hidden history
    Gardner, Kate
    [J]. Physics World, 2023, 36 (07)
  • [9] Uncovering Forgotten History
    Edmondson, Paul
    [J]. PRESERVATION, 2020, 72 (04): : 4 - 4
  • [10] Uncovering the genetic signature of quantitative trait evolution with replicated time series data
    S U Franssen
    R Kofler
    C Schlötterer
    [J]. Heredity, 2017, 118 : 42 - 51