Algorithms for processing the group K nearest-neighbor query on distributed frameworks

被引:3
|
作者
Moutafis, Panagiotis [1 ]
Garcia-Garcia, Francisco [2 ]
Mavrommatis, George [1 ]
Vassilakopoulos, Michael [1 ]
Corral, Antonio [2 ]
Iribarne, Luis [2 ]
机构
[1] Univ Thessaly, Dept Elect & Comp Engn, Data Struct & Engn Lab, Volos, Greece
[2] Univ Almeria, Dept Informat, Almeria, Spain
关键词
Spatial query processing; Group nearest-neighbor query; MapReduce algorithms; Hadoop; SpatialHadoop;
D O I
10.1007/s10619-020-07317-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm's performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
引用
收藏
页码:733 / 784
页数:52
相关论文
共 50 条
  • [31] Efficient Processing of Probabilistic Group Nearest Neighbor Query on Uncertain Data
    Li, Jiajia
    Wang, Botao
    Wang, Guoren
    Bi, Xin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 436 - 450
  • [32] Efficient Processing of Relevant Nearest-Neighbor Queries
    Efstathiades, Christodoulos
    Efentakis, Alexandros
    Pfoser, Dieter
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2016, 2 (03)
  • [33] Multiple k nearest neighbor query processing in spatial network databases
    Huang, Xuegang
    Jensen, Christian S.
    Saltenis, Simonas
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4152 : 266 - 281
  • [34] Efficient k Nearest Neighbor Query Processing on Public Transportation Network
    Li, Jiajia
    Zhang, Lingyun
    Ni, Cancan
    An, Yunzhe
    Zong, Chuanyu
    Zhang, Anzhen
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 1108 - 1115
  • [35] Top-k spatial preference query for group nearest neighbor
    Computing Center, Northeastern University, Shenyang
    110819, China
    不详
    114051, China
    不详
    110819, China
    Dongbei Daxue Xuebao, 10 (1412-1415 and 1421):
  • [36] On semantic caching and query scheduling for mobile nearest-neighbor search
    Zheng, BH
    Lee, WC
    Lee, DL
    WIRELESS NETWORKS, 2004, 10 (06) : 653 - 664
  • [37] On Semantic Caching and Query Scheduling for Mobile Nearest-Neighbor Search
    Baihua Zheng
    Wang-Chien Lee
    Dik Lun Lee
    Wireless Networks, 2004, 10 : 653 - 664
  • [38] OPTIMIZATION OF K NEAREST-NEIGHBOR DENSITY ESTIMATES
    FUKUNAGA, K
    HOSTETLER, LD
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1973, 19 (03) : 320 - 326
  • [39] EFFECTIVE ALGORITHMS FOR THE NEAREST-NEIGHBOR METHOD IN THE CLUSTERING PROBLEM
    HATTORI, K
    TORII, Y
    PATTERN RECOGNITION, 1993, 26 (05) : 741 - 746
  • [40] NEAREST-NEIGHBOR HEURISTICS IN ACCELERATED ALGORITHMS OF OPTIMIZATION PROBLEMS
    LIN, SC
    HSUEH, HC
    PHYSICA A, 1994, 203 (3-4): : 369 - 380