Document decomposition for XML compression: A heuristic approach

被引:0
|
作者
Choi, Byron [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.
引用
收藏
页码:202 / 217
页数:16
相关论文
共 50 条
  • [1] XCpaqs: Compression of XML document with XPath query support
    Wang, HZ
    Li, JZ
    Luo, JZ
    He, ZY
    [J]. ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS, 2004, : 354 - 358
  • [2] A String Approach for Dynamic XML Document
    Qin, Zunyue
    Tang, Yong
    Xu, Hongzhi
    [J]. ADVANCES IN MANUFACTURING TECHNOLOGY, PTS 1-4, 2012, 220-223 : 2512 - +
  • [3] A visual approach to XML document design and transformation
    Zhang, K
    Zhang, DQ
    Deng, Y
    [J]. IEEE SYMPOSIA ON HUMAN-CENTRIC COMPUTING LANGUAGES AND ENVIRONMENTS, PROCEEDINGS, 2001, : 312 - 319
  • [4] A fusion approach to XML structured document retrieval
    Larson, RR
    [J]. INFORMATION RETRIEVAL, 2005, 8 (04): : 601 - 629
  • [5] A Fusion Approach to XML Structured Document Retrieval
    Ray R. Larson
    [J]. Information Retrieval, 2005, 8 : 601 - 629
  • [6] An Approach for XML Data Decomposition with integrity checking in XML Multisignature scheme
    Liu, Baolong
    Chen, Hua
    [J]. 2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 75 - 78
  • [7] A web-based approach for XML document management
    Yun, BH
    Kim, HK
    Wang, JH
    Lim, ME
    Park, SK
    Kang, HK
    [J]. WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVII, PROCEEDINGS: CYBERNETICS AND INFORMATICS: CONCEPTS AND APPLICATIONS (PT II), 2001, : 344 - 347
  • [8] Schemaless approach of mapping XML document into relational database
    Dweib, Ibrahim
    Awadi, Ayman
    Elrhman, Seif Elduola Fath
    Lu, Joan
    [J]. 2008 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2008, : 167 - +
  • [9] A new sequential mining approach to XML document clustering
    Hwang, JH
    Ryu, KH
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 266 - 276
  • [10] An XML-based approach to document flow verification
    Bertino, E
    Ferrari, E
    Mella, G
    [J]. INFORMATION SECURITY, PROCEEDINGS, 2004, 3225 : 207 - 218