Fast hierarchical clustering algorithm using locality-sensitive hashing

被引:0
|
作者
Koga, H [1 ]
Ishibashi, T [1 ]
Watanabe, T [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Chofu, Tokyo 182, Japan
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains. Because the result is presented as a dendrogram, one can easily figure out the distance and the inclusion relation between clusters. One drawback of the agglomerative hierarchical clustering is its large time complexity of O(n(2)), which would make this method infeasible against large data, where n expresses the number of the points in the data. This paper proposes a fast approximation algorithm for the single linkage clustering algorithm that is a well-known agglomerative hierarchical clustering algorithm. Our algorithm reduces its time complexity to O(nB) by finding quickly the near clusters to be connected by use of Locality-Sensitive Hashing known as a fast algorithm for the approximated nearest neighbor search. Here B expresses the maximum number of points thrown into a single hash entry and practically grows a simple constant compared to n for sufficiently large hash tables. By experiment, we show that (1) the proposed algorithm obtains similar clustering results to the single linkage algorithm and that (2) it runs faster for large data than the single linkage algorithm.
引用
收藏
页码:114 / 128
页数:15
相关论文
共 50 条
  • [1] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Koga, Hisashi
    Ishibashi, Tetsuo
    Watanabe, Toshinori
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 25 - 53
  • [2] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Hisashi Koga
    Tetsuo Ishibashi
    Toshinori Watanabe
    [J]. Knowledge and Information Systems, 2007, 12 : 25 - 53
  • [3] Locality-Sensitive Hashing Optimizations for Fast Malware Clustering
    Oprisa, Ciprian
    Checiches, Marius
    Nandrean, Adrian
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 97 - +
  • [4] Fast Redescription Mining Using Locality-Sensitive Hashing
    Karjalainen, Maiju
    Galbrun, Esther
    Miettinen, Pauli
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT VII, ECML PKDD 2024, 2024, 14947 : 124 - 142
  • [5] An Improved Algorithm for Locality-Sensitive Hashing
    Cen, Wei
    Miao, Kehua
    [J]. 10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 61 - 64
  • [6] An adaptive mean shift clustering algorithm based on locality-sensitive hashing
    Zhang, Xinhong
    Cui, Yanbin
    Li, Duoyi
    Liu, Xianxing
    Zhang, Fan
    [J]. OPTIK, 2012, 123 (20): : 1891 - 1894
  • [7] A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing
    Wang, Lei
    Liu, Kaiyuan
    Li, Sujun
    Tang, Haixu
    [J]. PROTEOMICS, 2020, 20 (21-22)
  • [8] In Defense of Locality-Sensitive Hashing
    Ding, Kun
    Huo, Chunlei
    Fan, Bin
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) : 87 - 103
  • [9] Using Locality-sensitive Hashing for Rendezvous Search
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    [J]. ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1743 - 1749
  • [10] Hardware acceleration of k-mer clustering using locality-sensitive hashing
    Soto, Javier E.
    Krohmer, Thomas
    Hernandez, Cecilia
    Figueroa, Miguel
    [J]. 2019 22ND EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2019, : 659 - 662