Detecting changes in XML documents

被引:131
|
作者
Cobéna, G [1 ]
Abiteboul, S [1 ]
Marian, A [1 ]
机构
[1] INRIA, Rocquencourt, France
关键词
D O I
10.1109/ICDE.2002.994696
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space,even at the cost of some loss of "quality". Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively; our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly, propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the "optimal" in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [21] OXONE: A scalable solution for detecting superior quality deltas on ordered large XML documents
    Leonardi, Erwin
    Bhowmick, Sourav S.
    [J]. CONCEPTUAL MODELING - ER 2006, PROCEEDINGS, 2006, 4215 : 196 - +
  • [22] Detecting information leakage in updating XML documents of fine-grained access control
    Chatvichienchai, Somchai
    Iwaihara, Mizuho
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 286 - 296
  • [23] Compacting XML documents
    Kálmán, M
    Havasi, F
    Gyimóthy, T
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2006, 48 (02) : 90 - 106
  • [24] Structuring XML documents
    Mobley, K
    [J]. TECHNICAL COMMUNICATION, 2000, 47 (02) : 253 - 255
  • [25] Towards efficient management of changes in XML-Based software documents
    Park, G
    Shin, W
    Kim, K
    Wu, CS
    [J]. SOFTWARE ENGINEERING RESEARCH AND APPLICATIONS, 2004, 3026 : 136 - 147
  • [26] XML Schema in XML Documents with Usage Control
    Sun, Lili
    Li, Yan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (10): : 170 - 177
  • [27] Computing similarity between XML documents for XML mining
    Lee, JW
    Park, SS
    [J]. ENGINEERING KNOWLEDGE IN THE AGE OF THE SEMANTIC WEB, PROCEEDINGS, 2004, 3257 : 492 - 493
  • [28] Algorithms for Generating XML Documents from Probabilistic XML
    Zhu Yan
    Ma Haitao
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1578 - +
  • [29] Unification of XML DTD for XML documents with similar structure
    Yoo, CS
    Woo, SM
    Kim, YS
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 3, 2005, 3482 : 954 - 963
  • [30] Exploiting XML Schema for Interpreting XML Documents as RDF
    Thuy, Pham Thi Thu
    Lee, Young-Koo
    Lee, Sungyoung
    Jeong, Byeong-Soo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, VOL 2, 2008, : 555 - 558