On Correcting XML Documents with Respect to a Schema

被引:0
|
作者
Amavi, Joshua [1 ]
Bouchou, Beatrice [2 ]
Savary, Agata [2 ]
机构
[1] Univ Orleans, LIFO, Orleans, France
[2] Univ Tours, LI, Blois, France
来源
COMPUTER JOURNAL | 2014年 / 57卷 / 05期
关键词
XML processing; document-to-schema correction; tree edit distance; STRUCTURAL SIMILARITY; ALGORITHM; REVALIDATION; VALIDATION; EVOLUTION;
D O I
10.1093/comjnl/bxt006
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present an algorithm for the correction of an XML document with respect to schema constraints expressed as a document type definition. Given a well-formed XML document t seen as a tree, a schema S and a non-negative threshold th, the algorithm finds every tree t' valid with respect to S such that the edit distance between t and t' is no higher than th. The algorithm is based on a recursive exploration of the finite-state automata representing structural constraints imposed by the schema, as well as on the construction of an edit distance matrix storing edit sequences leading to correction trees. We prove the termination, correctness and completeness of the algorithm, as well as its exponential time complexity. We also perform experimental tests on real-life XML data showing the influence of various input parameters on the execution time and on the number of solutions found. The algorithm's implementation demonstrates polynomial rather than exponential behavior. It has been made public under the GNU LGPL v3 license. As we show in our in-depth discussion of the related work, this is the first full-fledged study of the document-to-schema correction problem.
引用
收藏
页码:639 / 674
页数:36
相关论文
共 50 条
  • [21] Schema-level access control policies for XML documents
    Muldner, Tomasz
    Leighton, Gregory
    Miziolek, Jan Krzysztof
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2009, 5 (04) : 465 - 494
  • [22] A complex XML schema to map the XML documents of distance education technical specifications into relational database
    Zhu X.-H.
    Zeng Q.-L.
    Cao Q.-H.
    [J]. International Journal of Digital Content Technology and its Applications, 2010, 4 (08) : 182 - 192
  • [24] XML schema
    Vaishampayan, V
    [J]. TECHNICAL COMMUNICATION, 2004, 51 (02) : 299 - 300
  • [25] Schema-less, semantics-based change detection for XML documents
    Zhang, SH
    Dyreson, C
    Snodgrass, RT
    [J]. WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 279 - 290
  • [26] Correction of Invalid XML Documents with Respect to Single Type Tree Grammars
    Svoboda, Martin
    Mlynkova, Irena
    [J]. NETWORKED DIGITAL TECHNOLOGIES, 2011, 136 : 179 - 194
  • [27] Visualization of XML Conceptual Schema Recovered from XML Schema Definition
    Fong, Joseph
    Cheung, San Kuen
    Shiu, Herbert
    Cheung, Chi Chung
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2005, 1 (04) : 209 - +
  • [28] An Efficient Access Control Model for Schema-Based Relational Storage of XML Documents
    Patel, Jigishaben
    Atay, Mustafa
    [J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 97 - 102
  • [29] Schema-aware labelling of XML documents for efficient query and update processing in SemCrypt
    Grün, K
    Karlinger, M
    Schrefl, M
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2006, 21 (01): : 65 - 82
  • [30] The XML Tree Model - toward an XML conceptual schema reversed from XML Schema Definition
    Fong, Joseph
    Cheung, San Kuen
    Shiu, Herbert
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 64 (03) : 624 - 661