Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data

被引:25
|
作者
DiScala, Michael [1 ]
Abadi, Daniel J. [1 ]
机构
[1] Yale Univ, New Haven, CT 06520 USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/2882903.2882924
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-describing key-value data formats such as JSON are becoming increasingly popular as application developers choose to avoid the rigidity imposed by the relational model. Database systems designed for these self-describing formats, such as MongoDB, encourage users to use denormalized, heavily nested data models so that relationships across records and other schema information need not be predefined or standardized. Such data models contribute to long-term development complexity, as their lack of explicit entity and relationship tracking burdens new developers unfamiliar with the dataset. Furthermore, the large amount of data repetition present in such data layouts can introduce update anomalies and poor scan performance, which reduce both the quality and performance of analytics over the data. In this paper we present an algorithm that automatically transforms the denormalized, nested data commonly found in NoSQL systems into traditional relational data that can be stored in a standard RDBMS. This process includes a schema generation algorithm that discovers relationships across the attributes of the denormalized datasets in order to organize those attributes into relational tables. It further includes a matching algorithm that discovers sets of attributes that represent overlapping entities and merges those sets together. These algorithms reduce data repetition, allow the use of data analysis tools targeted at relational data, accelerate scan-intensive algorithms over the data, and help users gain a semantic understanding of complex, nested datasets.
引用
收藏
页码:295 / 310
页数:16
相关论文
共 50 条
  • [1] A Comparative Study of Relational Database and Key-Value Database for Big Data Applications
    Puangsaijai, Wittawat
    Puntheeranurak, Sutheera
    [J]. 2017 INTERNATIONAL ELECTRICAL ENGINEERING CONGRESS (IEECON), 2017,
  • [2] Knowledge Discovery from Big Social Key-Value Data
    Leung, Carson K.
    Braun, Peter
    Enkhee, Murun
    Pazdor, Adam G. M.
    Sarumi, Oluwafemi A.
    Tran, Kimberly
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 484 - 491
  • [3] A Relational Database Schema on the Transactional Key-Value Store Scalaris
    Kruber, Nico
    Schintke, Florian
    Berlin, Michael
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [4] Robust Data Sharing with Key-Value Stores
    Basescu, Cristina
    Cachin, Christian
    Eyal, Ittay
    Haas, Robert
    Sorniotti, Alessandro
    Vukolic, Marko
    Zachevsky, Ido
    [J]. 2012 42ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2012,
  • [5] Automatic Generation of Trajectory Data Warehouse Schemas
    Arfaoui, Nouha
    Akaichi, Jalel
    [J]. INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES 2016, 2016, 55 : 373 - 383
  • [6] Reasearch on Database Schema Comparison of Relational Databases and Key-value Stores
    Zhou, Peng
    Li, Mei
    Huang, Jing
    Fang, Hua
    [J]. MODERN TECHNOLOGIES IN MATERIALS, MECHANICS AND INTELLIGENT SYSTEMS, 2014, 1049 : 1860 - 1863
  • [7] A Resource Allocation Controller for Key-Value Data Stores
    Kim, Young Ki
    HoseinyF, M. Reza
    Lee, Young Choon
    Zomaya, Albert Y.
    [J]. 2017 IEEE 16TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2017, : 281 - 284
  • [8] Key-value caching of geospatial data for distributed GIS
    Tu, Zhenfa
    Meng, Lingkui
    Zhang, Wen
    Huang, Changqing
    [J]. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2013, 38 (11): : 1339 - 1343
  • [9] Towards Private Key-Value Data Collection with Histogram
    Zhang, Xiaojian
    Xu, Yaxin
    Fu, Nan
    Meng, Xiaofeng
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (03): : 624 - 637
  • [10] Efficient Key-Value Data Placement for ZNS SSD
    Oh, Gijun
    Yang, Junseok
    Ahn, Sungyong
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (24):