Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

被引:19
|
作者
Wojnar, Ales [1 ]
Mlynkova, Irena [1 ]
Dokulil, Jiri [1 ]
机构
[1] Charles Univ Prague, Dept Software Engn, Fac Math & Phys, CR-11800 Prague 1, Czech Republic
关键词
XML schema; DTD; XSD; Similarity; Data semantics; Structural analysis; PERFORMANCE; METHODOLOGY; ALGORITHM;
D O I
10.1016/j.ins.2009.12.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The natural optimization strategy for XML-to-relational mapping methods is exploitation of similarity of XML data. However, none of the current similarity evaluation approaches is suitable for this purpose. While the key emphasis is currently put on semantic similarity of XML data, the main aspect of XML-to-relational mapping methods is analysis of their structure. In this paper we propose an approach that utilizes a verified strategy for structural similarity evaluation - tree edit distance - to DTD constructs. This approach is able to cope with the fact that DTDs involve several types of nodes and can form general graphs. In addition, it is optimized for the specific features of XML data and, if required, it enables one to exploit the semantics of element/attribute names. Using a set of experiments we show the impact of these extensions on similarity evaluation. And, finally, we discuss how this approach can be extended for XSDs, which involve plenty of "syntactic sugar", i.e. constructs that are structurally or semantically equivalent. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:1817 / 1836
页数:20
相关论文
共 50 条
  • [11] Estimation of Structural Similarity of XML Document Based on Frequency and Path
    Ren Xueli
    Dai Yubiao
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 272 - 275
  • [12] On semantic weighting and decomposition techniques for XML schemas
    Chen, YF
    Kuo, CCJ
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS V, 2004, 5601 : 100 - 110
  • [14] XML-SIM-CHANGE: Structure and Content Semantic Similarity Detection among XML Document Versions
    Viyanon, Waraporn
    Madria, Sanjay K.
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010, PT II, 2010, 6427 : 1061 - 1078
  • [15] Optimization of XML Queries by Using Semantics in XML Schemas and the Document Structure
    Le, Dung Xuan Thi
    Maghaydah, Moad
    Orgun, Mehmet A.
    Zhong, Youliang
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 343 - 353
  • [16] A novel structural similarity measure on XML data for integrated document management
    Ng, K. L.
    Ng, T. Y.
    JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2007, 48 (01) : 42 - 52
  • [17] Evaluation of a document database description by different XML schemas
    Kenab, M
    Braham, TO
    Bazex, P
    Proceedings of the IASTED International Conference on Databases and Applications, 2004, : 244 - 249
  • [18] A METHODOLOGY FOR USING EDGES TO MEASURE STRUCTURAL AND SEMANTIC SIMILARITY OF XML DOCUMENTS
    Qiu, Hong-Jun
    Yu, Wen-Jing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1653 - +
  • [19] A progressive clustering algorithm to group the XML data by structural and semantic similarity
    Nayak, Richi
    Tran, Tien
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (04) : 723 - 743
  • [20] Annotation Rules for XML Schemas with Grouped Semantic Annotations
    Campos-Rebelo, Rogerio
    Moutinho, Filipe
    Paiva, Luis
    Malo, Pedro
    45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 5469 - 5474