Schema profiling of document-oriented databases

被引:42
|
作者
Gallinucci, Enrico [1 ,2 ]
Golfarelli, Matteo [1 ,2 ]
Rizzi, Stefano [1 ,2 ]
机构
[1] Univ Bologna, DISI, Viale Risorgimento 2, I-40136 Bologna, Italy
[2] CINI, Via Solaria 113, I-00198 Rome, Italy
基金
欧盟地平线“2020”;
关键词
NoSQL; Document-oriented databases; Schema discovery; Decision trees;
D O I
10.1016/j.is.2018.02.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In document-oriented databases, schema is a soft concept and the documents in a collection can be stored using different local schemata. This gives designers and implementers augmented flexibility; however, it requires an extra effort to understand the rules that drove the use of alternative schemata when sets of documents with different -and possibly conflicting- schemata are to be analyzed or integrated. In this paper we propose a technique, called schema profiling, to explain the schema variants within a collection in document-oriented databases by capturing the hidden rules explaining the use of these variants. We express these rules in the form of a decision tree (schema profile). Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles. The algorithm we adopt to this end is inspired by the well-known C4.5 classification algorithm and builds on two original features: the coupling of value-based and schema-based conditions within schema profiles, and the introduction of a novel measure of entropy to assess the quality of a schema profile. A set of experimental tests made on both synthetic and real datasets demonstrates the effectiveness and efficiency of our approach. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 25
页数:13
相关论文
共 50 条
  • [1] Automatic Extraction of a Document-oriented NoSQL Schema
    Abdelhedi, Fatma
    Brahim, Amal Ait
    Rajhi, Hela
    Ferhat, Rabah Tighilt
    Zurfluh, Gilles
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 192 - 199
  • [2] Automatic Schema Generation for Document-Oriented Systems
    Gomez, Paola
    Casallas, Rubby
    Roncancio, Claudia
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT I, 2020, 12391 : 152 - 163
  • [3] Implementation of Multidimensional Databases with Document-Oriented NoSQL
    Chevalier, M.
    El Malki, M.
    Kopliku, A.
    Teste, O.
    Tournier, R.
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 379 - 390
  • [5] Meta-Modelling In Document-Oriented Databases
    Okockis, Vilius
    Bukauskas, Linas
    [J]. DATABASES AND INFORMATION SYSTEMS VIII, 2014, 270 : 57 - 70
  • [6] Document-Oriented Data Schema for Relational Database Migration to NoSQL
    Hamouda, Shady
    Zainol, Zurinahni
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA), 2017, : 43 - 50
  • [7] Design a Data Warehouse Schema from Document-Oriented database
    Bouaziz, Senda
    Nabli, Ahlem
    Gargouri, Faiez
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 221 - 230
  • [8] Enhancing Open Collaborative Applications Using Document-oriented Databases
    Salarmehr, Reza
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2016, : 166 - 169
  • [9] Document-oriented Models for Data Warehouses NoSQL Document-oriented for Data Warehouses
    Chevalier, Max
    El Malki, Mohammed
    Kopliku, Arlind
    Teste, Olivier
    Tournier, Ronan
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 142 - 149
  • [10] Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure
    Morishima, Shin
    Matsutani, Hiroki
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 108 - 115