On Correcting XML Documents with Respect to a Schema

被引:0
|
作者
Amavi, Joshua [1 ]
Bouchou, Beatrice [2 ]
Savary, Agata [2 ]
机构
[1] Univ Orleans, LIFO, Orleans, France
[2] Univ Tours, LI, Blois, France
来源
COMPUTER JOURNAL | 2014年 / 57卷 / 05期
关键词
XML processing; document-to-schema correction; tree edit distance; STRUCTURAL SIMILARITY; ALGORITHM; REVALIDATION; VALIDATION; EVOLUTION;
D O I
10.1093/comjnl/bxt006
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present an algorithm for the correction of an XML document with respect to schema constraints expressed as a document type definition. Given a well-formed XML document t seen as a tree, a schema S and a non-negative threshold th, the algorithm finds every tree t' valid with respect to S such that the edit distance between t and t' is no higher than th. The algorithm is based on a recursive exploration of the finite-state automata representing structural constraints imposed by the schema, as well as on the construction of an edit distance matrix storing edit sequences leading to correction trees. We prove the termination, correctness and completeness of the algorithm, as well as its exponential time complexity. We also perform experimental tests on real-life XML data showing the influence of various input parameters on the execution time and on the number of solutions found. The algorithm's implementation demonstrates polynomial rather than exponential behavior. It has been made public under the GNU LGPL v3 license. As we show in our in-depth discussion of the related work, this is the first full-fledged study of the document-to-schema correction problem.
引用
收藏
页码:639 / 674
页数:36
相关论文
共 50 条
  • [1] XML Schema in XML Documents with Usage Control
    Sun, Lili
    Li, Yan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (10): : 170 - 177
  • [2] Exploiting XML Schema for Interpreting XML Documents as RDF
    Thuy, Pham Thi Thu
    Lee, Young-Koo
    Lee, Sungyoung
    Jeong, Byeong-Soo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, VOL 2, 2008, : 555 - 558
  • [3] Measuring the Reusable Quality for XML Schema Documents
    Thaw, Tinzar
    Misra, Sanjay
    [J]. ACTA POLYTECHNICA HUNGARICA, 2013, 10 (04) : 87 - 106
  • [4] Form X: A XML schema based tool for inputting xml documents
    Mohamed, Ahmed Cheinane Ould
    Hongxing, Liu
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, : 393 - 395
  • [5] Construction of an optimal relational schema for storing XML documents in an RDBMS without using DTD/XML schema
    Leonov, AV
    Khusnutdinov, RR
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2004, 30 (06) : 323 - 336
  • [6] Construction of an Optimal Relational Schema for Storing XML Documents in an RDBMS without Using DTD/XML Schema
    A. V. Leonov
    R. R. Khusnutdinov
    [J]. Programming and Computer Software, 2004, 30 : 323 - 336
  • [7] Searchable Compression of Office Documents by XML Schema Subtraction
    Boettcher, Stefan
    Hartel, Rita
    Messinger, Christian
    [J]. DATABASE AND XML TECHNOLOGIES, 2010, 6309 : 103 - 112
  • [8] Interpreting XML documents via an RDF Schema ontology
    Klein, M
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 889 - 893
  • [9] Browsing and editing XML schema documents with an interactive editor
    Sifer, M
    Peres, Y
    Maarek, Y
    [J]. DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2822 : 97 - 111
  • [10] Inferring a Relax NG Schema from XML Documents
    Kim, Guen-Hae
    Ko, Sang-Ki
    Han, Yo-Sub
    [J]. LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 2016, 9618 : 400 - 411