Algorithms for processing the group K nearest-neighbor query on distributed frameworks

被引:3
|
作者
Moutafis, Panagiotis [1 ]
Garcia-Garcia, Francisco [2 ]
Mavrommatis, George [1 ]
Vassilakopoulos, Michael [1 ]
Corral, Antonio [2 ]
Iribarne, Luis [2 ]
机构
[1] Univ Thessaly, Dept Elect & Comp Engn, Data Struct & Engn Lab, Volos, Greece
[2] Univ Almeria, Dept Informat, Almeria, Spain
关键词
Spatial query processing; Group nearest-neighbor query; MapReduce algorithms; Hadoop; SpatialHadoop;
D O I
10.1007/s10619-020-07317-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm's performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
引用
收藏
页码:733 / 784
页数:52
相关论文
共 50 条
  • [1] Algorithms for processing the group K nearest-neighbor query on distributed frameworks
    Panagiotis Moutafis
    Francisco García-García
    George Mavrommatis
    Michael Vassilakopoulos
    Antonio Corral
    Luis Iribarne
    Distributed and Parallel Databases, 2021, 39 : 733 - 784
  • [2] MapReduce Algorithms for the K Group Nearest-Neighbor Query
    Moutafis, Panagiotis
    Garcia-Garcia, Francisco
    Mavrommatis, George
    Vassilakopoulos, Michael
    Corral, Antonio
    Iribarne, Luis
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 448 - 455
  • [3] Plane-Sweep Algorithms for the K Group Nearest-Neighbor Query
    Roumelis, George
    Vassilakopoulos, Michael
    Corral, Antonio
    Manolopoulos, Yannis
    2015 1ST INTERNATIONAL CONFERENCE ON GEOGRAPHICAL INFORMATION SYSTEMS THEORY, APPLICATIONS AND MANAGEMENT (GISTAM), 2015, : 83 - 93
  • [4] Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark
    Moutafis, Panagiotis
    Mavrommatis, George
    Vassilakopoulos, Michael
    Corral, Antonio
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (11)
  • [5] Range nearest-neighbor query
    Hu, HB
    Lee, DL
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (01) : 78 - 91
  • [6] k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks
    Hyeong-Il Kim
    Jae-Woo Chang
    Journal of Computer Science & Technology, 2013, 28 (04) : 585 - 596
  • [7] k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks
    Hyeong-Il Kim
    Jae-Woo Chang
    Journal of Computer Science and Technology, 2013, 28 : 585 - 596
  • [8] k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks
    Kim, Hyeong-Il
    Chang, Jae-Woo
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (04) : 585 - 596
  • [9] Distributed processing of moving K-nearest-neighbor query on moving objects
    Wu, Wei
    Guo, Wenyuan
    Tan, Kian-Lee
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1091 - +
  • [10] GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution
    Polychronis Velentzas
    Michael Vassilakopoulos
    Antonio Corral
    Christos Antonopoulos
    International Journal of Parallel Programming, 2023, 51 : 275 - 308