An efficient parallel processing method for skyline queries in MapReduce

被引:0
|
作者
Junsu Kim
Myoung Ho Kim
机构
[1] KAIST,School of Computing
来源
关键词
Skyline query processing; Parallel processing; Distributed processing; MapReduce; Distributed systems; Big data;
D O I
暂无
中图分类号
学科分类号
摘要
Skyline queries are useful for finding only interesting tuples from multi-dimensional datasets for multi-criteria decision making. To improve the performance of skyline query processing for large-scale data, it is necessary to use parallel and distributed frameworks such as MapReduce that has been widely used recently. There are several approaches which process skyline queries on a MapReduce framework to improve the performance of query processing. Some methods process a part of the skyline computation in a serial manner, while there are other methods that process all parts of the skyline computation in parallel. However, each of them suffers from at least one of two drawbacks: (1) the serial computations may prevent them from fully utilizing the parallelism of the MapReduce framework; (2) when processing the skyline queries in a parallel and distributed manner, the additional overhead for the parallel processing may outweigh the benefit gained from parallelization. In order to efficiently process skyline queries for large data in parallel, we propose a novel two-phase approach in MapReduce framework. In the first phase, we start by dividing the input dataset into a number of subsets (called cells) and then we compute local skylines only for the qualified cells. The outer-cell filter used in this phase considerably improves the performance by eliminating a large number of tuples in unqualified cells. In the second phase, the global skyline is computed from local skylines. To separately determine global skyline tuples from each local skyline in parallel, we design the inner-cell filter and also propose efficient methods to reduce the overhead caused by computing and utilizing the inner-cell filters. The primary advantage of our approach is that it processes skyline queries fast and in a fully parallelized manner in all states of the MapReduce framework with the two filtering techniques. Throughout extensive experiments, we demonstrate that the proposed approach substantially increases the overall performance of skyline queries in comparison with the state-of-the-art skyline processing methods. Especially, the proposed method achieves remarkably good performance and scalability with regard to the dataset size and the dimensionality. Our approach has significant benefits for large-scale query processing of skylines in distributed and parallel computing environments.
引用
收藏
页码:886 / 935
页数:49
相关论文
共 50 条
  • [11] An Efficient Filtering Method for Processing Continuous Skyline Queries on Sensor Data
    Jang, Su Min
    Park, Choon Seo
    Seo, Dong Min
    Yo, Jae Soo
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2010, E93B (08) : 2180 - 2183
  • [12] Parallel distributed processing of constrained skyline queries by filtering
    Cui, Bin
    Lu, Hua
    Xu, Quanqing
    Chen, Lijiang
    Dai, Yafei
    Zhou, Yongluan
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 546 - +
  • [13] Parallel Processing Strategies for Skyline Queries Tolerant to Outliers
    Nerzic, Pierre
    Jaudoin, Helene
    Pivert, Olivier
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2018, 33 (10) : 1992 - 2018
  • [14] Parallel Skyline Queries
    Afrati, Foto N.
    Koutris, Paraschos
    Suciu, Dan
    Ullman, Jeffrey D.
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (04) : 1008 - 1037
  • [15] Efficient Processing of Area Skyline Query in MapReduce Framework
    Choudhury, Zakia Zinat
    Zaman, Asif
    Hamid, Md Ekramul
    2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 79 - 82
  • [16] Parallel Skyline Queries
    Foto N. Afrati
    Paraschos Koutris
    Dan Suciu
    Jeffrey D. Ullman
    Theory of Computing Systems, 2015, 57 : 1008 - 1037
  • [17] Efficient Batch Processing of Proximity Queries with MapReduce
    Nam, GiWoong
    Kim, DongEun
    Lee, JongHyeok
    Youn, Hee Yong
    Kim, Ung-Mo
    ACM IMCOM 2015, Proceedings, 2015,
  • [18] LShape Partitioning: Parallel Skyline Query Processing Using MapReduce
    Wijayanto, Heri
    Wang, Wenlu
    Ku, Wei-Shinn
    Chen, Arbee L. P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3363 - 3376
  • [19] An Efficient Method for Processing Reverse Skyline Queries over Arbitrary Spatial Objects
    Han, Ah
    Li, Zhonghe
    Kwon, Dongseop
    Park, Youngbae
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [20] MapReduce Algorithm for Variants of Skyline Queries: Skyband and Dominating Queries
    Siddique, Md Anisuzzaman
    Tian, Hao
    Qaosar, Mahboob
    Morimoto, Yasuhiko
    ALGORITHMS, 2019, 12 (08)