Distributed query evaluation on semistructured data

被引:40
|
作者
Suciu, D
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] AT&T Corp, Shannon Labs, New York, NY 10013 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2002年 / 27卷 / 01期
关键词
algorithm; languages; theory; distributed evaluation; nested queries; parallel complexity; regular expressions; semistructured data;
D O I
10.1145/507234.507235
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semistructured data is modeled as a rooted, labeled graph. The simplest kinds of queries on such data are those which traverse paths described by regular path expressions. More complex queries combine several regular path expressions, with complex data restructuring, and with sub-queries. This article addresses the problem of efficient query evaluation on distributed, semistructured databases. In our setting, the nodes of the database are distributed over a fixed number of sites, and the edges are classified into local (with both ends in the same site) and cross edges (with ends in two distinct sites). Efficient evaluation in this context means that the number of communication steps is fixed (independent on the data or the query), and that the total amount of data sent depends only on the number of cross links and of the size of the query's result. We give such algorithms in three different settings. First, for the simple case of queries consisting of a single regular expression; second, for all queries in a calculus for graphs based on structural recursion which in addition to regular path expressions can perform nontrivial restructuring of the graph; and third, for a class of queries we call select-where queries that combine pattern matching and regular path expressions with data restructuring and subqueries. This article also includes a discussion on how these methods can be used to derive efficient view maintenance algorithms.
引用
收藏
页码:1 / 62
页数:62
相关论文
共 50 条
  • [31] Robust Distributed Query Processing for Streaming Data
    Lei, Chuan
    Rundensteiner, Elke A.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (02):
  • [32] Query Optimization over Distributed Data Stream
    Wang, Shuang
    Tan, Zhenhua
    Gao, Xiaoxing
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 2, PROCEEDINGS, 2009, : 415 - 418
  • [33] A Distributed Query Method for RDF Data on Spark
    Guo, Minru
    Wang, Jingbin
    BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 102 - 115
  • [34] A pipelined architecture for distributed text query evaluation
    Alistair Moffat
    William Webber
    Justin Zobel
    Ricardo Baeza-Yates
    Information Retrieval, 2007, 10 : 205 - 231
  • [35] Typechecking for semistructured data
    Suciu, D
    DATABASE PROGRAMMING LANGUAGES, 2002, 2397 : 1 - 20
  • [36] A pipelined architecture for distributed text query evaluation
    Moffat, Alistair
    Webber, William
    Zobel, Justin
    Baeza-Yates, Ricardo
    INFORMATION RETRIEVAL, 2007, 10 (03): : 205 - 231
  • [37] Query evaluation for distributed heterogeneous relational databases
    Chen, YJ
    Benn, W
    3RD IFCIS INTERNATIONAL CONFERENCE ON COOPERATIVE INFORMATION SYSTEMS - PROCEEDINGS, 1998, : 44 - 53
  • [38] Semistructured data and XML
    Suciu, D
    INFORMATION ORGANIZATION AND DATABASES: FOUNDATIONS OF DATA ORGANIZATION, 2000, 579 : 9 - 30
  • [39] Describing semistructured data
    Cardelli, L
    SIGMOD RECORD, 2001, 30 (04) : 80 - 85
  • [40] Distributed Query Engine for Multiple-Query Optimization over Data Stream
    Yang, Junye
    Zhang, Yong
    Wang, Jin
    Xing, Chunxiao
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 523 - 527