k Nearest Neighbor Similarity Join Algorithm on High-Dimensional Data Using Novel Partitioning Strategy

被引:0
|
作者
Ma, Youzhong [1 ,2 ]
Hua, Qiaozhi [3 ]
Wen, Zheng [4 ]
Zhang, Ruiling [1 ]
Zhang, Yongxin [1 ]
Li, Haipeng [5 ]
机构
[1] Luoyang Normal Univ, Sch Informat & Technol, Luoyang 471934, Peoples R China
[2] Henan Key Lab Big Data Proc & Analyt Elect Commer, Luoyang 471934, Peoples R China
[3] Hubei Univ Arts & Sci, Comp Sch, Xiangyang 441000, Peoples R China
[4] Waseda Univ, Sch Fundamental Sci & Engn, Tokyo 1698050, Japan
[5] Capinfo Co Ltd, Beijing 100010, Peoples R China
基金
中国国家自然科学基金; 日本学术振兴会;
关键词
INTERNET; REDUCTION; THINGS;
D O I
10.1155/2022/1249393
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k nearest neighbor similarity join on high-dimensional data has broad applications in many fields; several key challenges still exist for this task such as "curse of dimensionality" and large scale of the dataset. A new dimensionality reduction scheme is proposed by using random projection technique, then we design two novel partition strategies, including equal width partition strategy and distance split tree-based partition strategy, and finally, we propose k nearest neighbor join algorithm on high-dimensional data based on the above partition strategies. We conduct comprehensive experiments to test the performance of the proposed approaches, and the experimental results show that the proposed methods have good effectiveness and performance.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] An algorithm for incremental nearest neighbor search in high-dimensional data spaces
    Lee, DH
    Lee, HD
    Choi, IH
    Kim, HJ
    [J]. HUMAN SOCIETY AND THE INTERNET, PROCEEDINGS: INTERNET-RELATED SOCIO-ECONOMIC ISSUES, 2001, 2105 : 436 - 453
  • [2] A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm
    Dubiner, Moshe
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (10) : 6646 - 6658
  • [3] High-dimensional shared nearest neighbor clustering algorithm
    Yin, J
    Fan, XL
    Chen, YQ
    Ren, JT
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 494 - 502
  • [4] Nearest-neighbor-intersection algorithm for identifying strong predictors using high-dimensional data
    Roy, Arighna
    Denton, Anne
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 416 - 421
  • [5] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    [J]. Turkish Journal of Electrical Engineering and Computer Sciences, 2019, 27 (06): : 4082 - 4101
  • [6] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
  • [7] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
  • [8] Epsilon grid order:: An algorithm for the similarity join on massive high-dimensional data
    Böhm, C
    Braunmüller, B
    Krebs, F
    Kriege, HP
    [J]. SIGMOD RECORD, 2001, 30 (02) : 379 - 388
  • [9] A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
    李文法
    Wang Gongming
    Ma Nan
    Liu Hongzhe
    [J]. High Technology Letters, 2016, 22 (03) : 241 - 247
  • [10] A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data
    Gong, Chaoyu
    Su, Zhi-Gang
    Wang, Pei-Hong
    Wang, Qian
    You, Yang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5563 - 5576