Algorithms for processing the group K nearest-neighbor query on distributed frameworks

被引:3
|
作者
Moutafis, Panagiotis [1 ]
Garcia-Garcia, Francisco [2 ]
Mavrommatis, George [1 ]
Vassilakopoulos, Michael [1 ]
Corral, Antonio [2 ]
Iribarne, Luis [2 ]
机构
[1] Univ Thessaly, Dept Elect & Comp Engn, Data Struct & Engn Lab, Volos, Greece
[2] Univ Almeria, Dept Informat, Almeria, Spain
关键词
Spatial query processing; Group nearest-neighbor query; MapReduce algorithms; Hadoop; SpatialHadoop;
D O I
10.1007/s10619-020-07317-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm's performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
引用
收藏
页码:733 / 784
页数:52
相关论文
共 50 条
  • [41] NEAREST-NEIGHBOR DISTRIBUTED LEARNING UNDER COMMUNICATION CONSTRAINTS
    Marano, Stefano
    Matta, Vincenzo
    Willett, Peter
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3278 - 3282
  • [42] On Group Nearest Group Query Processing
    Deng, Ke
    Sadiq, Shazia
    Zhou, Xiaofang
    Xu, Hu
    Fung, Gabriel Pui Cheong
    Lu, Yansheng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (02) : 295 - 308
  • [43] The Moving K Diversified Nearest Neighbor Query
    Gu, Yu
    Liu, Guanli
    Qi, Jianzhong
    Xu, Hongfei
    Yu, Ge
    Zhang, Rui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (10) : 2778 - 2792
  • [44] Visible Reverse k-Nearest Neighbor Query Processing in Spatial Databases
    Gao, Yunjun
    Zheng, Baihua
    Chen, Gencai
    Lee, Wang-Chien
    Lee, Ken C. K.
    Li, Qing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1314 - 1327
  • [45] Processing Group Nearest Group Query
    Deng, Ke
    Xu, Hu
    Sadiq, Shazia
    Lu, Yansheng
    Fung, Gabriel Pui Cheong
    Shen, Heng Tao
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1144 - +
  • [46] Robust K nearest neighbor query processing algorithm in wireless sensor networks
    Liu, Liang
    Qin, Xiao-Lin
    Liu, Ya-Li
    Li, Bo-Han
    Tongxin Xuebao/Journal on Communications, 2010, 31 (11): : 171 - 179
  • [47] In-memory k Nearest Neighbor GPU-based Query Processing
    Velentzas, Polychronis
    Vassilakopoulos, Michael
    Corral, Antonio
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON GEOGRAPHICAL INFORMATION SYSTEMS THEORY, APPLICATIONS AND MANAGEMENT (GISTAM), 2020, : 310 - 317
  • [48] Fast nearest-neighbor searching for nonlinear signal processing
    Merkwirth, Christian
    Parlitz, Ulrich
    Lauterborn, Werner
    Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 2000, 62 (2 A): : 2089 - 2097
  • [49] Processing All k-Nearest Neighbor Query on Large Multidimensional Data
    Huu Vu Lam Cao
    Trong Nhan Phan
    Minh Quang Tran
    Thanh Luan Hong
    Minh Nhat Quang Truong
    2016 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP), 2016, : 11 - 17
  • [50] Fast nearest-neighbor searching for nonlinear signal processing
    Merkwirth, C
    Parlitz, U
    Lauterborn, W
    PHYSICAL REVIEW E, 2000, 62 (02): : 2089 - 2097