Pivot-Based Distributed K-Nearest Neighbor Mining

被引:2
|
作者
Kuhlman, Caitlin [1 ]
Yan, Yizhou [1 ]
Cao, Lei [2 ]
Rundensteiner, Elke [1 ]
机构
[1] Worcester Polytech Inst, Worcester, MA 01609 USA
[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II | 2017年 / 10535卷
关键词
K-nearest neighbor search; Distributed computing MapReduce;
D O I
10.1007/978-3-319-71246-8_51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k-nearest neighbor (kNN) search is a fundamental data mining task critical to many data analytics methods. Yet no effective techniques to date scale kNN search to large datasets. In this work we present PkNN, an exact distributed method that by leveraging modern distributed architectures for the first time scales kNN search to billion point datasets. The key to the PkNN strategy is a multi-round kNN search that exploits pivot-based data partitioning at each stage. This includes an outlier-driven partition adjustment mechanism that effectively minimizes data duplication and achieves a balanced workload across the compute cluster. Aggressive data-driven bounds along with a tiered support assignment strategy ensure correctness while limiting computation costs. Our experimental study on multi-dimensional real-world data demonstrates that PkNN achieves significant speedup over the state-of-the-art and scales effectively in data cardinality. Code and data related to this chapter are available at: http://solar-10.wpi.edu/cakuhlman/ PkNN.
引用
收藏
页码:843 / 860
页数:18
相关论文
共 50 条
  • [1] Distributed k-nearest Neighbor Search based on angular similarity
    Yu, Xiaopeng
    Yu, Xiaogao
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 141 - +
  • [2] Distributed k-Nearest Neighbor Queries in Metric Spaces
    Ding, Xin
    Zhang, Yuanliang
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    WEB AND BIG DATA (APWEB-WAIM 2018), PT I, 2018, 10987 : 236 - 252
  • [3] Distributed and Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5972 - 5985
  • [4] Fuzzy Monotonic K-Nearest Neighbor Versus Monotonic Fuzzy K-Nearest Neighbor
    Zhu, Hong
    Wang, Xizhao
    Wang, Ran
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (09) : 3501 - 3513
  • [5] Validation Based Modified K-Nearest Neighbor
    Parvin, Hamid
    Alizadeh, Hosein
    Minaei-Bidgoli, Behrouz
    IAENG TRANSACTIONS ON ENGINEERING TECHNOLOGIES, VOL II, 2009, 1127 : 153 - 161
  • [6] A memetic algorithm based on k-nearest neighbor
    Xu, Jin
    Gu, Qiong
    Gai, Zhihua
    Gong, Wenyin
    Journal of Computational Information Systems, 2014, 10 (22): : 9565 - 9574
  • [7] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [8] K-nearest neighbor classification based on influence function
    College of Information Engineering, Zhengzhou University, Zhengzhou
    450052, China
    Dianzi Yu Xinxi Xuebao, 7 (1626-1632):
  • [9] K-Nearest Neighbor Based Local Distribution Alignment
    Tian, Yang
    Li, Bo
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 470 - 480
  • [10] K-Nearest Neighbor based Bagging SVM Pruning
    Ye, Ren
    Le, Zhang
    Suganthan, P. N.
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND ENSEMBLE LEARNING (CIEL), 2013, : 25 - 30