Algorithms for processing the group K nearest-neighbor query on distributed frameworks

被引:3
|
作者
Moutafis, Panagiotis [1 ]
Garcia-Garcia, Francisco [2 ]
Mavrommatis, George [1 ]
Vassilakopoulos, Michael [1 ]
Corral, Antonio [2 ]
Iribarne, Luis [2 ]
机构
[1] Univ Thessaly, Dept Elect & Comp Engn, Data Struct & Engn Lab, Volos, Greece
[2] Univ Almeria, Dept Informat, Almeria, Spain
关键词
Spatial query processing; Group nearest-neighbor query; MapReduce algorithms; Hadoop; SpatialHadoop;
D O I
10.1007/s10619-020-07317-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm's performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
引用
收藏
页码:733 / 784
页数:52
相关论文
共 50 条
  • [21] Distributed nearest-neighbor Gaussian processes
    Grenier, Isabelle
    Sanso, Bruno
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (07) : 2886 - 2898
  • [22] Predictive Continuous Nearest-Neighbor Query Processing in Moving-Object Databases
    Zhao, Qingsong
    Lu, Yansheng
    Zhang, Yanduo
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3019 - +
  • [23] Literature Study on k-Nearest Neighbor query processing
    Anuja, K., V
    Mani, Shinu Acca
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [24] GPU-aided edge computing for processing the k nearest-neighbor query on SSD-resident data
    Velentzas, Polychronis
    Vassilakopoulos, Michael
    Corral, Antonio
    INTERNET OF THINGS, 2021, 15
  • [25] Efficient Nearest-Neighbor Query and Clustering of Planar Curves
    Aronov, Boris
    Filtser, Omrit
    Horton, Michael
    Katz, Matthew J.
    Sheikhan, Khadijeh
    ALGORITHMS AND DATA STRUCTURES, WADS 2019, 2019, 11646 : 28 - 42
  • [26] A PROBABILISTIC FILTER PROTOCOL FOR CONTINUOUS NEAREST-NEIGHBOR QUERY
    Zhu, Jianpeng
    Jin, Jian
    Wang, Ying
    2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 399 - +
  • [27] Efficient Filter Algorithms for Reverse k-Nearest Neighbor Query
    Wang, Shengsheng
    Lv, Qiannan
    Liu, Dayou
    Gu, Fangming
    WEB-AGE INFORMATION MANAGEMENT, 2011, 6897 : 18 - 30
  • [28] Processing global nearest neighbor query
    Liu Xiaofeng
    Chen Chuanbo
    Liu YunSheng
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 458 - +
  • [29] Nearest-neighbor updating algorithm for distributed recommendation
    Hu, Yuqi
    Jia, Dongyan
    Zhang, Fuzhi
    Journal of Computational Information Systems, 2011, 7 (01): : 80 - 87
  • [30] Nearest-Neighbor Distributed Learning by Ordered Transmissions
    Marano, Stefano
    Matta, Vincenzo
    Willett, Peter
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (21) : 5217 - 5230