Efficient evaluation of partial match queries for XML documents using information retrieval techniques

被引:0
|
作者
Park, YH [1 ]
Whang, KY
Lee, LS
Han, WS
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon, South Korea
[3] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[4] Kyungpook Natl Univ, Dept Comp Engn, Taejon, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis "//" in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
引用
收藏
页码:95 / 112
页数:18
相关论文
共 50 条
  • [31] Fast and efficient computation of reachability queries over linked XML documents' graphs
    Sayed, Awny
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2009, 5 (01) : 56 - +
  • [32] Structural Information Retrieval in XML Documents: A Graph-based Approach
    Belahyane, Imane
    Mammass, Mouad
    Abioui, Hasna
    Moutaoukkil, Assmaa
    Idarrou, Ali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (03) : 654 - 659
  • [33] Evaluation Techniques for Generalized Path Pattern Queries on XML Data
    Xiaoying Wu
    Dimitri Theodoratos
    Stefanos Souldatos
    Theodore Dalamagas
    Timos Sellis
    World Wide Web, 2010, 13 : 441 - 474
  • [34] Evaluation Techniques for Generalized Path Pattern Queries on XML Data
    Wu, Xiaoying
    Theodoratos, Dimitri
    Souldatos, Stefanos
    Dalamagas, Theodore
    Sellis, Timos
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2010, 13 (04): : 441 - 474
  • [35] Efficient preprocessing of XML queries using structured signatures
    Chung, YD
    Kim, JW
    Kim, MH
    INFORMATION PROCESSING LETTERS, 2003, 87 (05) : 257 - 264
  • [36] XXS: Efficient XPath Evaluation on Compressed XML Documents
    Brisaboa, Nieves R.
    Cerdeira-Pena, Ana
    Navarro, Gonzalo
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (03)
  • [37] Application of information retrieval techniques to single writer documents
    Vinciarelli, A
    PATTERN RECOGNITION LETTERS, 2005, 26 (14) : 2262 - 2271
  • [38] Study of information retrieval using fuzzy queries
    Swain, M
    Anderson, JA
    Swain, N
    Korrapati, R
    Proceedings of the IEEE SoutheastCon 2004: EXCELLENCE IN ENGINEERING, SCIENCE, AND TECHNOLOGY, 2005, : 527 - 533
  • [39] ALGORITHMS FOR PROCESSING PARTIAL MATCH QUERIES USING WORD FRAGMENTS
    ALAGAR, VS
    INFORMATION SYSTEMS, 1980, 5 (04) : 323 - 332
  • [40] Eager Evaluation of Partial Tree-Pattern Queries on XML Streams
    Theodoratos, Dimitri
    Wu, Xiaoying
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 241 - 246