Compressing XML documents using recursive finite state automata

被引:0
|
作者
Subramanian, H [1 ]
Shankar, P [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a scheme for automatically generating compressors for XML documents from Document Type Definition(DTD) specifications. Our algorithm is a lossless adaptive algorithm where the model used for compression and decompression is generated automatically from the DTD, and is used in conjunction with an arithmetic compressor to produce a compressed version of the document. The structure of the model mirrors the syntactic specification of the document. Our compression scheme is on-line, that is, it can compress the document as it is being read. We have implemented the compressor generator, and provide the results of experiments on some large XML databases whose DTD's are specified. We note that the average compression is better than that of XMLPPM, the only other on-line tool we are aware of. The tool is able to compress massive documents where XMLPPM failed to work as it ran out of memory. We believe the main appeal of this technique is the fact that the underlying model is so simple and yet so effective.
引用
收藏
页码:282 / 293
页数:12
相关论文
共 50 条
  • [1] Person name identification in Chinese documents using finite state automata
    Shen, B
    Zhang, ZF
    Yuan, CF
    [J]. IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 478 - 481
  • [2] Updates, Schema Updates and Validation of XML Documents - Using Abstract State Machines with Automata-Defined States
    Schewe, Klaus-Dieter
    Thalheim, Bernhard
    Wang, Qing
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2009, 15 (10) : 2028 - 2057
  • [3] FINITE AUTOMATA ON RECURSIVE CLOSURE OF FINITE GRAPHS
    REDKO, SE
    [J]. DOPOVIDI AKADEMII NAUK UKRAINSKOI RSR SERIYA A-FIZIKO-MATEMATICHNI TA TECHNICHNI NAUKI, 1981, (05): : 82 - 85
  • [4] The complexity of compressing subsegments of images described by finite automata
    Karhumäki, J
    Plandowski, W
    Rytter, W
    [J]. DISCRETE APPLIED MATHEMATICS, 2003, 125 (2-3) : 235 - 254
  • [5] Constructing finite state automata for high-performance XML web services
    van Engelen, RA
    [J]. IC'04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS 1 AND 2, 2004, : 975 - 981
  • [6] Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method
    Mawardi, Viny Christanti
    Susanto, Niko
    Naga, Dali Santun
    [J]. 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL SYSTEMS, TECHNOLOGY AND INFORMATION (ICESTI 2017), 2018, 164
  • [7] An XML world wide web search engine using finite automata
    Hu, WC
    Yeh, JH
    Lin, WC
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS: INFORMATION SYSTEMS DEVELOPMENT I, 2002, : 89 - 94
  • [8] Image coding using finite state automata
    Quenneville, C
    Meunier, J
    [J]. OPTICAL ENGINEERING, 1996, 35 (01) : 113 - 118
  • [9] Deterministic finite automata with recursive calls and DPDAs
    Gallier, JH
    La Torre, S
    Mukhopadhyay, S
    [J]. INFORMATION PROCESSING LETTERS, 2003, 87 (04) : 187 - 193
  • [10] Jointly Extracting and Compressing Documents with Summary State Representations
    Mendes, Afonso
    Narayan, Shashi
    Miranda, Sebastiao
    Marinho, Zita
    Martins, Andre F. T.
    Cohen, Shay B.
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3955 - 3966