Efficient filtering of XML documents with XPath expressions

被引:86
|
作者
Chan, CY [1 ]
Felber, P [1 ]
Garofalakis, M [1 ]
Rastogi, R [1 ]
机构
[1] Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源
VLDB JOURNAL | 2002年 / 11卷 / 04期
关键词
data dissemination; document filtering; index structure; XML; XPath;
D O I
10.1007/s00778-002-0077-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The publish/subscribe paradigm is a popular model for allowing publishers (i.e., data generators) to selectively disseminate data to a large number of widely dispersed subscribers (i.e., data consumers) who have registered their interest in specific information items. Early publish/subscribe systems have typically relied on simple subscription mechanisms, such as keyword or "bag of words" matching, or simple comparison predicates on attribute values. The emergence of XML as a standard for information exchange on the Internet has led to an increased interest in using more expressive subscription mechanisms (e.g., based on XPath expressions) that exploit both the structure and the content of published XML documents. Given the increased complexity of these new datafiltering mechanisms, the problem of effectively identifying the subscription profiles that match an incoming XML document poses a difficult and important research challenge. In this paper, we propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that, we believe, make it especially attractive for large-scale publish/subscribe systems. First, XTrie is designed to support effective filtering based on complex XPath expressions (as opposed to simple, single-path specifications). Second, our XTrie structure and algorithms are designed to support both ordered and unordered matching of XML data. Third, by indexing on sequences of elements organized in a trie structure and using a sophisticated matching algorithm XTrie is able to both reduce the number of unnecessary index probes as well as avoid redundant matchings, thereby providing extremely efficient filtering. Our experimental results over a wide range of XML document and XPath expression workloads demonstrate that our XTrie index structure outperforms earlier approaches by wide margins.
引用
下载
收藏
页码:354 / 379
页数:26
相关论文
共 50 条
  • [41] Decidable classes of documents for XPath
    Bárány, Vince
    Bojańczyk, Mikolaj
    Figueira, Diego
    Parys, Pawel
    Leibniz International Proceedings in Informatics, LIPIcs, 2012, 18 : 99 - 111
  • [42] Normalization of the Forward XPath for Efficient Query Evaluation over XML Data Streams
    Qiao, Lixiang
    Yang, Zhimin
    Yang, Chi
    Ren, Kaijun
    Liu, Chang
    JCPC: 2009 JOINT CONFERENCE ON PERVASIVE COMPUTING, 2009, : 365 - +
  • [43] Phil: A Lazy Implementation of a Language for Approximate Filtering of XML Documents
    Baggi, M.
    Ballis, D.
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2008, 216 : 93 - 109
  • [44] Generating XML Data for XPath Queries
    Rychnovsky, Dusan
    Holubova, Irena
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 724 - 731
  • [45] Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques
    Park, YH
    Whang, KY
    Lee, BS
    Han, WS
    JOURNAL OF SYSTEMS AND SOFTWARE, 2006, 79 (02) : 180 - 190
  • [46] Filtering unsatisfiable Xpath queries
    Groppe, Jinghua
    Groppe, Sven
    ICEIS 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATIONAL SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2006, : 157 - +
  • [47] Filtering unsatisfiable XPath queries
    Groppe, Jinghua
    Groppe, Sven
    DATA & KNOWLEDGE ENGINEERING, 2008, 64 (01) : 134 - 169
  • [48] A Non Redundant Compact XML Storage for Efficient Indexing and Querying of XML Documents
    Atique, Mohammed
    Raut, A. D.
    GLOBAL TRENDS IN COMPUTING AND COMMUNICATION SYSTEMS, PT 1, 2012, 269 : 109 - +
  • [49] Processing XPath expressions in relational databases
    Pankowski, T
    SOFSEM 2004: THEORY AND PRACTICE OF COMPUTER SCIENCE, PROCEEDINGS, 2004, 2932 : 265 - 276
  • [50] Efficient Processing XPath Queries by Compressed XML Query Tree based on Structural Index
    Zhang, Haiwei
    Hu, Xiangyu
    Zhang, Ying
    Wen, Yanlong
    Yuan, Xiaojie
    MECHATRONICS AND INTELLIGENT MATERIALS, PTS 1 AND 2, 2011, 211-212 : 726 - 730