Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream

被引:2
|
作者
Liu, Zhen [1 ]
Zheng, Qiuhua [1 ]
Ji, Zhongping [1 ]
Zhao, Weihua [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Fast clustering; Sparse Self-Represented; Dynamic sparse initialization; Image recognition; ALGORITHMS;
D O I
10.1016/j.engappai.2017.11.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The demand of fast clustering increases rapidly as we keep collecting tremendously large amount of data in the last decade. In this paper, we propose a nonparametric and representative-based Sparse Self-Represented Network Map for fast clustering on large dataset. Each node in the network generates a heat map for the dataset by receiving stimulations from data within its Accepting Field. We developed a weight adjusting method to learn and summarize the clustering pattern of the data. Such learned map is used for computing clustering results, by breaking weak links and finding connected components Rather than employing an iterative process to find local minima, our network passes the dataset only once and is able to capture the global pattern of the dataset as well as detecting natural number of clusters. As a nonparametric method, we propose Sparse Dynamic Instantiation to avoid the curse of dimensionality, namely a node or a link is instantiated only when stimulated by input data. As a result, the overall complexity is linear to the data dimension. Our algorithm is tested on synthetic and real datasets and compare with popular clustering algorithms (K-means++, Expectation Maximization, Mean Shift and StreamKM++) as well as state-of-art clustering algorithm (Affinity Propagation and Density Peak). We also applied our clustering algorithm to mobile location clustering, building a Visual Dictionary for image recognition, and clustering data streams. Our experiments indicate that our algorithm can be a better alternative for all compared popular clustering algorithms especially when efficiency is the primary consideration, namely we drastically improve time and space complexity but retain equal level of accuracy. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:121 / 130
页数:10
相关论文
共 18 条
  • [1] Density Based Self Organizing Incremental Neural Network For Data Stream Clustering
    Xu, Baile
    Shen, Furao
    Zhao, Jinxi
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2654 - 2661
  • [2] Data summarization based fast hierarchical clustering method for large datasets
    Patra, Bidyut Kr.
    Nandi, Sukumar
    Viswanath, P.
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 278 - +
  • [3] A clustering method considering continuity of data based on self-organizing map
    Imamura, Hiroki
    Fujimura, Makoto
    Kuroda, Hideo
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2006, 60 (08): : 1312 - 1316
  • [4] Application of Self-organizing Feature Map Neural Network Based on Data Clustering
    Hu, Xiang
    Yang, Yun
    Zhang, Lihong
    Xiang, Tao
    Hong, Chengqiu
    Zheng, Xiaotong
    PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 797 - 802
  • [5] A Novel Spatial Clustering Method based on Wavelet Network and Density Analysis for Data Stream
    Xu, Chonghuan
    JOURNAL OF COMPUTERS, 2013, 8 (08) : 2139 - 2143
  • [6] A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
    Chen, Jin-Yin
    He, Hui-Hao
    INFORMATION SCIENCES, 2016, 345 : 271 - 293
  • [7] Data Clustering Mining Method of Social Network Talent Recruitment Stream Based on MST Algorithm
    Li, Hongjian
    Hu, Nan
    ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2022, PT II, 2023, 469 : 99 - 111
  • [8] A density-based competitive data stream clustering network with self-adaptive distance metric
    Xu, Baile
    Shen, Furao
    Zhao, Jinxi
    NEURAL NETWORKS, 2019, 110 : 141 - 158
  • [9] Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
    Viswanath, P.
    Babu, V. Suresh
    PATTERN RECOGNITION LETTERS, 2009, 30 (16) : 1477 - 1488
  • [10] A fast encryption method of large enterprise financial data based on adversarial neural network
    Chu Y.
    International Journal of Industrial and Systems Engineering, 2023, 44 (03) : 302 - 315