Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

被引:19
|
作者
Wojnar, Ales [1 ]
Mlynkova, Irena [1 ]
Dokulil, Jiri [1 ]
机构
[1] Charles Univ Prague, Dept Software Engn, Fac Math & Phys, CR-11800 Prague 1, Czech Republic
关键词
XML schema; DTD; XSD; Similarity; Data semantics; Structural analysis; PERFORMANCE; METHODOLOGY; ALGORITHM;
D O I
10.1016/j.ins.2009.12.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The natural optimization strategy for XML-to-relational mapping methods is exploitation of similarity of XML data. However, none of the current similarity evaluation approaches is suitable for this purpose. While the key emphasis is currently put on semantic similarity of XML data, the main aspect of XML-to-relational mapping methods is analysis of their structure. In this paper we propose an approach that utilizes a verified strategy for structural similarity evaluation - tree edit distance - to DTD constructs. This approach is able to cope with the fact that DTDs involve several types of nodes and can form general graphs. In addition, it is optimized for the specific features of XML data and, if required, it enables one to exploit the semantics of element/attribute names. Using a set of experiments we show the impact of these extensions on similarity evaluation. And, finally, we discuss how this approach can be extended for XSDs, which involve plenty of "syntactic sugar", i.e. constructs that are structurally or semantically equivalent. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:1817 / 1836
页数:20
相关论文
共 50 条
  • [41] Similarity Algorithm Based on Weighted Hierarchical Structure of XML Document
    Sun, Xia
    Cheng, Hong-Bin
    Wang, Hai-Jun
    2009 WASE INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING, ICIE 2009, VOL II, 2009, : 143 - +
  • [42] A new sequential mining approach to XML document similarity computation
    Leung, HP
    Chung, FL
    Chan, SCF
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 356 - 362
  • [43] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [44] Structural similarity between XML documents and DTDs
    Ng, PKL
    Ng, VTY
    COMPUTATIONAL SICENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 412 - 421
  • [45] KCAM: Concentrating on structural similarity for XML fragments
    Kong, Lingbo
    Tang, Shiwei
    Yang, Dongqing
    Wang, Tengjiao
    Gao, Jun
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 36 - 48
  • [46] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [47] XML Structural Similarity Search Using MapReduce
    Yuan, Peisen
    Sha, Chaofeng
    Wang, Xiaoling
    Yang, Bin
    Zhou, Aoying
    Yang, Su
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 169 - +
  • [48] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [49] A novel method for measuring semantic similarity for XML schema matching
    Jeong, Buhwan
    Lee, Damon
    Cho, Hyunbo
    Lee, Jaewook
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) : 1651 - 1658
  • [50] Semantic Similarity Analysis of XML Schema Using Grid Computing
    Kim, Jaewook
    Lee, Sookyoung
    Halem, Milton
    Peng, Yun
    PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 57 - 62