Efficient evaluation of partial match queries for XML documents using information retrieval techniques

被引:0
|
作者
Park, YH [1 ]
Whang, KY
Lee, LS
Han, WS
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon, South Korea
[3] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[4] Kyungpook Natl Univ, Dept Comp Engn, Taejon, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis "//" in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
引用
收藏
页码:95 / 112
页数:18
相关论文
共 50 条
  • [1] Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques
    Park, YH
    Whang, KY
    Lee, BS
    Han, WS
    JOURNAL OF SYSTEMS AND SOFTWARE, 2006, 79 (02) : 180 - 190
  • [2] Distributed processing of queries for XML documents in an agent based information retrieval system
    Czejdo, B
    Miller, R
    Taylor, M
    Rusinkiewicz, M
    2000 KYOTO INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES: RESEARCH AND PRACTICE, PROCEEDINGS, 2000, : 246 - 253
  • [3] Efficient Storage and Retrieval of XML Documents Using XQuery
    Chiu, Yu-Bin
    Chen, Huei-Huang
    Liu, Chu-Yen
    Chen, Shih-Chih
    Hung, Chung-Wen
    MATERIALS, TRANSPORTATION AND ENVIRONMENTAL ENGINEERING, PTS 1 AND 2, 2013, 779-780 : 1685 - +
  • [4] On efficient matching of streaming XML documents and queries
    Lakshmanan, LVS
    Parthasarathy, S
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 142 - 160
  • [5] Documents, data, information retrieval, & XML
    Fichter, D
    Cervone, F
    ONLINE, 2000, 24 (06): : 30 - +
  • [6] System of information retrieval in XML documents
    Smadhi, S
    ISSUES AND TRENDS OF INFORMATION TECHNOLOGY MANAGEMENT IN CONTEMPORARY ORGANIZATIONS, VOLS 1 AND 2, 2002, : 736 - 739
  • [7] Flexible information retrieval on XML documents
    Grabs, T
    Schek, HJ
    INTELLIGENT SEARCH ON XML DATA: APPLICATIONS, LANGUAGES, MODELS IMPLEMENTATIONS AND BENCHMARKS, 2003, 2818 : 95 - 106
  • [8] Efficient Tree Pattern Queries On Encrypted XML Documents
    Rao, Fang-Yu
    Cao, Jianneng
    Kuzu, Mehmet
    Bertino, Elisa
    Kantarcioglu, Murat
    TRANSACTIONS ON DATA PRIVACY, 2013, 6 (03) : 199 - 226
  • [9] Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework
    Hricov, Radoslav
    Senk, Adam
    Kroha, Petr
    Valenta, Michal
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES: TOWARDS EFFICIENT SOLUTIONS FOR DATA ANALYSIS AND KNOWLEDGE REPRESENTATION, 2017, 716 : 28 - 41
  • [10] EFFICIENT RETRIEVAL OF PARTIAL DOCUMENTS
    ZOBEL, J
    MOFFAT, A
    WILKINSON, R
    SACKSDAVIS, R
    INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) : 361 - 377