PI-Join: Efficiently processing join queries on massive data

被引:7
|
作者
Han, Xixian [2 ]
Li, Jianzhong [2 ,3 ]
Yang, Donghua [1 ]
机构
[1] Harbin Inst Technol, Acad Fundamental & Interdisciplinary Sci, Harbin, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[3] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Massive data; PI-join; JPIPT construction stage; Result output stage; INDEX STRUCTURE; PERFORMANCE;
D O I
10.1007/s10115-011-0429-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ratio of disk capacity to disk transfer rate typically increases by 10x per decade. As a result, disk is becoming slower from the view of applications because of the much larger data volume that they need to store and process. In database systems, the less the data volume that is involved in query processing, the better the performance that is achieved. Disk-based join operation is a common but time-consuming database operation, especially in an environment of massive data in which I/O cost dominates the execution time. However, current join algorithms are only suitable for moderate or small data volume. They will incur high I/O cost when performing on massive data because of multi-pass I/O operations on the joined tables and the insensitivity to join selectivity. This paper proposes PI-Join a novel disk-based join algorithm that can efficiently process join queries involving massive data. PI-Join consists of two stages: JPIPT construction stage (JCS) and result output stage (ROS). JCS performs a cache-conscious construction algorithm on join attributes which are kept in column-oriented model to obtain join positional index pair table (JPIPT) of join results faster. The obtained JPIPT is used in ROS to retrieve results in a one-pass sequential selective scan on each table. We provide the correctness proof and cost analysis of PI-Join. Our experimental results indicate that PI-Join has a significant advantage over the existing join algorithms.
引用
收藏
页码:527 / 557
页数:31
相关论文
共 50 条
  • [1] PI-Join: Efficiently processing join queries on massive data
    Xixian Han
    Jianzhong Li
    Donghua Yang
    [J]. Knowledge and Information Systems, 2012, 32 : 527 - 557
  • [2] Efficiently processing (p, ε)-approximate join aggregation on massive data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. INFORMATION SCIENCES, 2014, 278 : 773 - 792
  • [3] Computing Complex Temporal Join Queries Efficiently
    Hu, Xiao
    Sintos, Stavros
    Gao, Junyang
    Agarwal, Pankaj K.
    Yang, Jun
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2076 - 2090
  • [4] Join Queries on Uncertain Data: Semantics and Efficient Processing
    Ge, Tingjian
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 697 - 708
  • [5] Processing distance join queries with constraints
    Papadopoulos, AN
    Nanopoulos, A
    Manolopoulos, Y
    [J]. COMPUTER JOURNAL, 2006, 49 (03): : 281 - 296
  • [6] Fast Processing of Join Queries with Instant Response
    Hamdi, Mohammed
    Yu, Feng
    Hou, Wen-Chi
    [J]. 2017 COMPUTING CONFERENCE, 2017, : 352 - 362
  • [7] Secure mediation of join queries by processing ciphertexts
    Biskup, Joachim
    Tsatedem, Christian
    Wiese, Lena
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 715 - 724
  • [8] Processing Top-k Join Queries
    Wu, Minji
    Berti-Equille, Laure
    Marian, Amelie
    Procopiuc, Cecilia M.
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 860 - 870
  • [9] Adaptive and incremental processing for distance join queries
    Shin, H
    Moon, B
    Lee, S
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) : 1561 - 1578
  • [10] Efficient processing of multiple structural join queries
    Subramanyam, GV
    Kumar, PS
    [J]. KEY TECHNOLOGIES FOR DATA MANAGEMENT, 2004, 3112 : 112 - 123