Document decomposition for XML compression: A heuristic approach

被引:0
|
作者
Choi, Byron [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.
引用
收藏
页码:202 / 217
页数:16
相关论文
共 50 条
  • [31] Software document reuse with XML
    Guerrieri, E
    [J]. FIFTH INTERNATIONAL CONFERENCE ON SOFTWARE REUSE - PROCEEDINGS, 1998, : 246 - 254
  • [32] A Verifier for Inconsistent XML Document
    Wu, Aihua
    He, Qi
    Tan, Zijing
    Wang, Wei
    [J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 711 - +
  • [33] Collaborative XML Document Versioning
    Roennau, Sebastian
    Borghoff, Uwe M.
    [J]. COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2009, 2009, 5717 : 930 - 937
  • [34] On classification of XML document transformations
    Dvorakova, Jana
    [J]. DATESO 2005 - DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2005, : 69 - 83
  • [35] Access control for XML document
    Bai, Yun
    [J]. NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 621 - 630
  • [36] Integrated XML document management
    Hsiao, HI
    Hui, JS
    Li, N
    Tijare, P
    [J]. EFFICIENCY AND EFFECTIVENESS OF XML TOOLS AND TECHNIQUES AND DATA INTEGRATION OVER THE WEB, 2003, 2590 : 47 - 67
  • [37] Metrics for XML document collections
    Klettke, M
    Schneider, L
    Heuer, A
    [J]. XML-BASED DATA MANAGEMENT AND MULTIMEDIA ENGINEERING-EDBT 2002 WORKSHOPS, 2002, 2490 : 15 - 28
  • [38] Building an XML document warehouse
    Feki, Jamel
    Ben Messaoud, Ines
    Zurfluh, Gilles
    [J]. JOURNAL OF DECISION SYSTEMS, 2013, 22 (02) : 122 - 148
  • [39] XML document indexes: A classification
    Catania, B
    Maddalena, A
    Vakali, A
    [J]. IEEE INTERNET COMPUTING, 2005, 9 (05) : 64 - 71
  • [40] XML Document Versioning and Revalidation
    Maly, Jakub
    Klimek, Jakub
    Mlynkova, Irena
    Necasky, Martin
    [J]. DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 49 - 60