Efficient processing of distributed Iceberg Semi-Joins

被引:0
|
作者
Imthiyaz, MK [1 ]
Dong, XA [1 ]
Kalnis, P [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where R and S reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server S with the pruning of unmatched tuples in server R. Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.
引用
收藏
页码:634 / 643
页数:10
相关论文
共 50 条
  • [1] MAGIC SEMI-JOINS
    CERI, S
    GOTTLOB, G
    TANCA, L
    WIEDERHOLD, G
    [J]. INFORMATION PROCESSING LETTERS, 1989, 33 (02) : 97 - 107
  • [2] Faster Querying for Database Integration and Virtualization with Distributed Semi-Joins
    Lawrence, Ramon
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1406 - 1410
  • [3] USING SEMI-JOINS TO SOLVE RELATIONAL QUERIES
    BERNSTEIN, PA
    CHIU, DMW
    [J]. JOURNAL OF THE ACM, 1981, 28 (01) : 25 - 40
  • [4] Efficient Processing Distributed Joins with Bloomfilter using MapReduce
    Zhang, Changchun
    Wu, Lei
    Li, Jing
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2013, 6 (03): : 43 - 57
  • [5] Efficient processing distributed joins with bloomfilter using MapReduce
    School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
    [J]. Int. J. Grid Distrib. Comput., 2013, 3 (43-58):
  • [6] Efficient processing of spatiotemporal joins
    Zimbrao, G
    De Souza, JM
    De Almeida, VT
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 190 - 195
  • [7] Efficient vague joins processing in the VQS
    Dang, TK
    Küng, J
    Wagner, R
    [J]. 7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2003, : 278 - 283
  • [8] Processing of Rank Joins in Highly Distributed Systems
    Doulkeridis, Christos
    Vlachou, Akrivi
    Norvag, Kjetil
    Kotidis, Yannis
    Polyzotis, Neoklis
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 606 - 617
  • [9] Evaluation of iceberg distance joins
    Shou, YT
    Mamoulis, N
    Cao, HP
    Papadias, D
    Cheung, DW
    [J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2003, 2750 : 270 - 288
  • [10] Efficient iceberg query processing in sensor networks
    Yang, Heejung
    Chung, Chin-Wan
    [J]. Computer Journal, 2013, 57 (12): : 1834 - 1851