JSON']JSON document clustering based on schema embeddings

被引:1
|
作者
Priya, D. Uma [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, Surathkal, India
关键词
Clustering; contextual similarity; deep autoencoders; embeddings; !text type='JSON']JSON[!/text; CLASSIFICATION;
D O I
10.1177/01655515221116522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The growing popularity of JSON as the data storage and interchange format increases the availability of massive multi-structured data collections. Clustering JSON documents has become a significant issue in organising large data collections. Existing research uses various structural similarity measures to perform clustering. However, differently annotated JSON structures may also encode semantic relatedness, necessitating the use of both syntactic and semantic properties of heterogeneous JSON schemas. Using the SchemaEmbed model, this paper proposes an embedding-based clustering approach for grouping contextually similar JSON documents. The SchemaEmbed model is designed using the pre-trained Word2Vec model and a deep autoencoder that considers both syntactic and semantic information of JSON schemas for clustering the documents. The Word2Vec model learns the attribute embeddings, and a deep autoencoder is designed to generate context-aware schema embeddings. Finally, the context-based similar JSON documents are grouped using a clustering algorithm. The effectiveness of the proposed work is evaluated using both real and synthetic datasets. The results and findings show that the proposed approach improves clustering quality significantly, with a high NMI score of 75%. In addition, we demonstrate that clustering results obtained by contextual similarity are superior to those obtained by traditional semantic similarity models.
引用
收藏
页码:1112 / 1130
页数:19
相关论文
共 50 条
  • [31] Translating JSON']JSON Data into Relational Data Using Schema-oblivious Approaches
    Bahta, Rahwa
    Atay, Mustafa
    PROCEEDINGS OF THE 2019 ANNUAL ACM SOUTHEAST CONFERENCE (ACMSE 2019), 2019, : 233 - 236
  • [32] JSON']JSON Encryption
    Abd El-Aziz, A. A.
    Kannan, A.
    2014 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2014,
  • [33] DSON: JSON']JSON CRDT Using Delta-Mutations For Document Stores
    Rinberg, Arik
    Solomon, Tomer
    Shlomo, Roee
    Khazma, Guy
    Lushi, Gal
    Keidar, Idit
    Ta-Shma, Paula
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (05): : 1053 - 1065
  • [34] LiteIndex: Memory-Efficient Schema-Agnostic Indexing for JSON']JSON documents in SQLite
    Shang, Siqi
    Wu, Qihong
    Wang, Tianyu
    Shao, Zili
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 435 - 440
  • [35] Implementation of SNMP-JSON']JSON Translator and Integrating SNMP Agents with JSON']JSON based Network Management System
    Pramodh, Kasula Chaithanya
    Nikhil, Iluri
    Singh, J. Ranjith
    2017 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2017, : 67 - 73
  • [36] Temporal JSON']JSON
    Goyal, Aayush
    Dyreson, Curtis
    2019 IEEE 5TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2019), 2019, : 135 - 144
  • [37] Temporal JSON schema versioning in the TJSchema framework
    1600, Digital Information Research Foundation, 11 Ramanujam Street, T.Nagar,, Chennai, 600017, India (15):
  • [38] Parametric schema inference for massive JSON datasets
    Mohamed-Amine Baazizi
    Dario Colazzo
    Giorgio Ghelli
    Carlo Sartiani
    The VLDB Journal, 2019, 28 : 497 - 521
  • [39] Composing JSON']JSON-Based Web APIs
    Izquierdo, Javier Luis Canovas
    Cabot, Jordi
    WEB ENGINEERING, ICWE 2014, 2014, 8541 : 390 - 399
  • [40] Providing Research Graph Data in JSON']JSON-LD Using Schema.org
    Wang, Jingbo
    Aryani, Amir
    Wyborn, Lesley
    Evans, Ben
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 1213 - 1218