Efficient evaluation of partial match queries for XML documents using information retrieval techniques

被引:0
|
作者
Park, YH [1 ]
Whang, KY
Lee, LS
Han, WS
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon, South Korea
[3] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[4] Kyungpook Natl Univ, Dept Comp Engn, Taejon, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis "//" in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
引用
收藏
页码:95 / 112
页数:18
相关论文
共 50 条
  • [21] Development of an XML information retrieval system for queries on contents and structures
    Shimizu, Toshiyuki
    Terada, Norimasa
    Yoshikawa, Masatoshi
    ICKS 2007: SECOND INTERNATIONAL CONFERENCE ON INFORMATICS RESEARCH FOR DEVELOPMENT OF KNOWLEDGE SOCIETY INFRASTRUCTURE, PROCEEDINGS, 2007, : 161 - +
  • [22] Efficient evaluation of XML path queries with automata
    Sun, B
    Lv, JH
    Wang, GR
    Yu, G
    Zhou, B
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 116 - 127
  • [23] An Expansion Method of XML Element Retrieval Techniques into Web Documents
    Keyaki, Atsushi
    Miyazaki, Jun
    Hatano, Kenji
    2014 IIAI 3RD INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2014), 2014, : 853 - 858
  • [24] An expressive and efficient language for XML information retrieval
    Chinenyanga, TT
    Kushmerick, N
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2002, 53 (06): : 438 - 453
  • [25] Enrichment of text documents using information retrieval techniques in a distributed environment
    Bueno, Francisco
    Garcia-Serrano, Ana
    Martinez-Fernandez, Jose L.
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 8348 - 8358
  • [26] A Framework for Efficient Information Retrieval Using NLP Techniques
    Subhashini, R.
    Kumar, V. Jawahar Senthil
    COMPUTER NETWORKS AND INFORMATION TECHNOLOGIES, 2011, 142 : 391 - +
  • [27] Efficient evaluation of XML middle-ware queries
    Fernandez, M
    Morishima, A
    Suciu, D
    SIGMOD RECORD, 2001, 30 (02) : 103 - 114
  • [28] Integrated Partial Match Query in Geographic Information Retrieval
    Zainol, Rosilawati
    Abu Bakar, Zainab
    Ali, Sayed Jamaludin Sayed
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2011, 19 : 41 - 49
  • [29] Efficient evaluation of multiple queries on streamed XML fragments
    Huo, Huan
    Zhou, Rui
    Wang, Guoren
    Hui, Xiaoyun
    Xiao, Chuan
    Yu, Yongqian
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 61 - 72
  • [30] EFFICIENT EVALUATION OF XML TWIG QUERIES WITH KEYWORD CONSTRAINTS
    Chang, Ya-Hui
    Luo, Chieh-Chang
    Huang, Chih-Chung
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2009, 32 (04) : 469 - 480