Detecting changes in XML documents

被引:131
|
作者
Cobéna, G [1 ]
Abiteboul, S [1 ]
Marian, A [1 ]
机构
[1] INRIA, Rocquencourt, France
关键词
D O I
10.1109/ICDE.2002.994696
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space,even at the cost of some loss of "quality". Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively; our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly, propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the "optimal" in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [1] Detecting changes to hybrid XML documents using relational Databases
    Leonardi, E
    Budiman, SL
    Bhowmick, SS
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 482 - 492
  • [2] Detecting content changes on ordered XML documents using relational databases
    Leonardi, E
    Bhowmick, SS
    Dharma, TS
    Madria, S
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 580 - 590
  • [3] XML-AD: Detecting anomalous patterns in XML documents
    Menahem, Eitan
    Schclar, Alon
    Rokach, Lior
    Elovici, Yuval
    [J]. INFORMATION SCIENCES, 2016, 326 : 71 - 88
  • [4] Xandy: Detecting changes on large unordered XML documents using relational Databases
    Leonardi, E
    Bhowmick, SS
    Madria, S
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 711 - 723
  • [5] Measuring changes in streaming XML documents
    Seaward, LM
    Saxton, LV
    [J]. PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 232 - 234
  • [6] Synthetising Changes in XML Documents as PULs
    Cavalieri, Federico
    Solimando, Alessandro
    Guerrini, Giovanna
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13): : 1630 - 1641
  • [7] Representing changes in XML documents using dimensions
    Gergatsoulis, M
    Stavrakas, Y
    [J]. DATABASE AND XML TECHNOLOGIES, 2003, 2824 : 208 - 222
  • [8] Changes to XML Namespaces in XML Schemas and their Effects on Associated XML Documents under Schema Versioning
    Brahmia, Zouhaier
    Grandi, Fabio
    Bouaziz, Rafik
    [J]. 2016 ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM 2016), 2016, : 43 - 50
  • [9] A sparse timestamp model for managing changes in XML documents
    Park, G
    Shin, W
    Kim, K
    Wu, C
    [J]. PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 882 - 886
  • [10] Mining changes from versions of dynamic XML documents
    Rusu, Laura Irina
    Rahay, Wermy
    Taniar, David
    [J]. KNOWLEDGE DISCOVERY FROM XML DOCUMENTS, PROCEEDINGS, 2006, 3915 : 3 - 12