Fast spectral clustering learning with hierarchical bipartite graph for large-scale data

被引:46
|
作者
Yang, Xiaojun [1 ]
Yu, Weizhong [2 ]
Wang, Rong [3 ]
Zhang, Guohao [1 ]
Nie, Feiping [3 ]
机构
[1] Guangdong Univ Technol, Sch Informat Engn, Guangzhou 510006, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Xian 710049, Peoples R China
[3] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Spectral clustering; Hierarchical graph; Bipartite graph; Large scale data; Out-of-sample;
D O I
10.1016/j.patrec.2018.06.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering (SC) is drawing more and more attention due to its effectiveness in unsupervised learning. However, all of these methods still have limitations. First, the method is not suitable for large-scale problems due to its high computational complexity. Second, the neighborhood weighted graph is constructed by the Gaussian kernel, meaning that more work is required to tune the heat-kernel parameter. In order to overcome these issues, we propose a novel spectral clustering based on hierarchical bipartite graph (SCHBG) approach by exploring multiple-layer anchors with a pyramid-style structure. First, the proposed algorithm constructs a hierarchical bipartite graph, and then performs spectral analysis on the graph. As a result, the computational complexity can be largely reduced. Furthermore, we adopt a parameter-free yet effective neighbor assignment strategy to construct the similarity matrix, which avoids the need to tune the heat-kernel parameter. Finally, the algorithm is able to deal with the out-of-sample problem for large-scale data and its computational complexity is significantly reduced. Experiments demonstrate the efficiency and effectiveness of the proposed SCHBG algorithm. Results show that the SCHBG approach can achieve good clustering accuracy (76%) on an 8-million datasets. Furthermore, owing to the use of the bipartite graph, the algorithm can reduce the time cost for out-of-sample situations with almost the same clustering accuracy as for large sizes of data. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:345 / 352
页数:8
相关论文
共 50 条
  • [1] Fast Semisupervised Learning With Bipartite Graph for Large-Scale Data
    He, Fang
    Nie, Feiping
    Wang, Rong
    Li, Xuelong
    Jia, Weimin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (02) : 626 - 638
  • [2] Fast Compressive Spectral Clustering for Large-Scale Sparse Graph
    Li, Ting
    Zhang, Yiming
    Liu, Hao
    Xue, Guangtao
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (01) : 193 - 202
  • [3] Large-Scale Multi-View Spectral Clustering via Bipartite Graph
    Li, Yeqing
    Nie, Feiping
    Huang, Heng
    Huang, Junzhou
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2750 - 2756
  • [4] Large-Scale Clustering With Structured Optimal Bipartite Graph
    Zhang, Han
    Nie, Feiping
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9950 - 9963
  • [5] Fast graph clustering in large-scale systems based on spectral coarsening
    Sun, Dasong
    [J]. INTERNATIONAL JOURNAL OF MODERN PHYSICS B, 2021, 35 (09):
  • [6] Fast Spectral Embedded Clustering Based on Structured Graph Learning for Large-Scale Hyperspectral Image
    Yang, Xiaojun
    Lin, Guoquan
    Liu, Yijun
    Nie, Feiping
    Lin, Liang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [7] Fast spectral clustering with self-adapted bipartite graph learning
    Yang, Xiaojun
    Zhu, Mingjun
    Cai, Yongda
    Wang, Zheng
    Nie, Feiping
    [J]. INFORMATION SCIENCES, 2023, 644
  • [8] Learning Distilled Graph for Large-Scale Social Network Data Clustering
    Liu, Wenhe
    Gong, Dong
    Tan, Mingkui
    Shi, Javen Qinfeng
    Yang, Yi
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1393 - 1404
  • [9] HGC: fast hierarchical clustering for large-scale single-cell data
    Zou, Ziheng
    Hua, Kui
    Zhang, Xuegong
    [J]. BIOINFORMATICS, 2021, 37 (21) : 3964 - 3965
  • [10] A fast hierarchical clustering algorithm for large-scale protein sequence data sets
    Szilagyi, Sandor M.
    Szilagyi, Laszlo
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 48 : 94 - 101