Efficient evaluation of partial match queries for XML documents using information retrieval techniques

被引:0
|
作者
Park, YH [1 ]
Whang, KY
Lee, LS
Han, WS
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon, South Korea
[3] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[4] Kyungpook Natl Univ, Dept Comp Engn, Taejon, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis "//" in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
引用
收藏
页码:95 / 112
页数:18
相关论文
共 50 条
  • [41] Advanced Information Retrieval Using XML Standards
    Schweiger, Ralf
    Hoelzer, Simon
    Dudeck, Joachim
    CONNECTING MEDICAL INFORMATICS AND BIO-INFORMATICS, 2005, 116 : 677 - 682
  • [42] PARTIAL MATCH RETRIEVAL USING RECURSIVE LINEAR HASHING
    RAMAMOHANARAO, K
    SACKSDAVIS, R
    BIT, 1985, 25 (03): : 477 - 484
  • [43] PARTIAL-MATCH RETRIEVAL USING HASHING AND DESCRIPTORS
    RAMAMOHANARAO, K
    LLOYD, JW
    THOM, JA
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1983, 8 (04): : 552 - 576
  • [44] Automated search and retrieval of information from imaged documents using optical correlation techniques
    Stalcup, BW
    Dennis, PW
    Dydyk, RB
    ALGORITHMS, DEVICES, AND SYSTEMS FOR OPTICAL INFORMATION PROCESSING III, 1999, 3804 : 92 - 101
  • [45] Automatic Evaluation of Programming Assignments Using Information Retrieval Techniques
    Rahaman, Md. Afzalur
    Hoque, Abu Sayed Md. Latiful
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING (ICCIDE 2018), 2019, 28 : 47 - 57
  • [46] Routing queries through a peer-to-peer InfoBeacons network using information retrieval techniques
    Seshadri, Sangeetha
    Cooper, Brian F.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (12) : 1754 - 1765
  • [47] Efficient Distributed Regular Path Queries on RDF Graphs Using Partial Evaluation
    Wang, Xin
    Wang, Junhu
    Zhang, Xiaowang
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1933 - 1936
  • [48] A succinct physical storage scheme for efficient evaluation of path queries in XML
    Zhang, N
    Kacholia, V
    Özsu, MT
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 54 - 65
  • [49] Efficient evaluation of generalized tree-pattern queries on XML streams
    Wu, Xiaoying
    Theodoratos, Dimitri
    Zuzarte, Calisto
    VLDB JOURNAL, 2010, 19 (05): : 661 - 686
  • [50] Efficient evaluation of generalized tree-pattern queries on XML streams
    Xiaoying Wu
    Dimitri Theodoratos
    Calisto Zuzarte
    The VLDB Journal, 2010, 19 : 661 - 686