JSON']JSON document clustering based on schema embeddings

被引:1
|
作者
Priya, D. Uma [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, Surathkal, India
关键词
Clustering; contextual similarity; deep autoencoders; embeddings; !text type='JSON']JSON[!/text; CLASSIFICATION;
D O I
10.1177/01655515221116522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The growing popularity of JSON as the data storage and interchange format increases the availability of massive multi-structured data collections. Clustering JSON documents has become a significant issue in organising large data collections. Existing research uses various structural similarity measures to perform clustering. However, differently annotated JSON structures may also encode semantic relatedness, necessitating the use of both syntactic and semantic properties of heterogeneous JSON schemas. Using the SchemaEmbed model, this paper proposes an embedding-based clustering approach for grouping contextually similar JSON documents. The SchemaEmbed model is designed using the pre-trained Word2Vec model and a deep autoencoder that considers both syntactic and semantic information of JSON schemas for clustering the documents. The Word2Vec model learns the attribute embeddings, and a deep autoencoder is designed to generate context-aware schema embeddings. Finally, the context-based similar JSON documents are grouped using a clustering algorithm. The effectiveness of the proposed work is evaluated using both real and synthetic datasets. The results and findings show that the proposed approach improves clustering quality significantly, with a high NMI score of 75%. In addition, we demonstrate that clustering results obtained by contextual similarity are superior to those obtained by traditional semantic similarity models.
引用
收藏
页码:1112 / 1130
页数:19
相关论文
共 50 条
  • [21] A JSON']JSON document algebra for query optimization
    Llano-Rios, Tomas
    Khalefa, Mohamed
    Badia, Antonio
    INFORMATION SYSTEMS, 2025, 132
  • [22] JSON']JSON: Data model, Query languages and Schema specification
    Bourhis, Pierre
    Reutter, Juan L.
    Suarez, Fernando
    Vrgoc, Domagoj
    PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 123 - 135
  • [23] Blind Queries Applied to JSON']JSON Document Stores
    Marrara, Stefania
    Pelucchi, Mauro
    Psaila, Giuseppe
    INFORMATION, 2019, 10 (10)
  • [24] HAJPAQUE: Hardware Accelerator for JSON']JSON Parsing, Querying and Schema Validation
    Agarwal, Samiksha
    Sarangi, Smruti R.
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 1 - 7
  • [25] A JSON']JSON Token-Based Authentication and Access Management Schema for Cloud SaaS Applications
    Ethelbert, Obinna
    Moghaddam, Faraz Fatemi
    Wieder, Philipp
    Yahyapour, Ramin
    2017 IEEE 5TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD 2017), 2017, : 47 - 53
  • [26] JSON']JSON Data Management - Supporting Schema-less Development in RDBMS
    Liu, Zhen Hua
    Hammerschmidt, Beda
    McMahon, Doug
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1247 - 1258
  • [27] An Empirical Study on the "Usage of Not" in Real-World JSON']JSON Schema Documents
    Baazizi, Mohamed-Amine
    Colazzo, Dario
    Ghelli, Giorgio
    Sartiani, Carlo
    Scherzinger, Stefanie
    CONCEPTUAL MODELING, ER 2021, 2021, 13011 : 102 - 112
  • [28] Multi-dimensional Analysis of Industrial Big Data Based JSON']JSON Document
    Li, Minbo
    Xu, Juxiong
    Han, Le
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 1066 - 1073
  • [29] LEI2JSON']JSON: Schema-based validation and conversion of livestock event information
    Habib, Mahir
    Kabir, Muhammad Ashad
    Zheng, Lihong
    SOFTWAREX, 2024, 26
  • [30] JSON']JSON Model: a Lightweight Featureful DSL for JSON']JSON
    Coelho, Fabien
    Yannou-Medrala, Claire
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2024, 2024, 14918 : 3 - 17