Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream

被引：2

作者：

Liu, Zhen ^{[1
]}

Zheng, Qiuhua ^{[1
]}

Ji, Zhongping ^{[1
]}

Zhao, Weihua ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2018年 / 68卷

基金：

中国国家自然科学基金;

关键词：

Fast clustering; Sparse Self-Represented; Dynamic sparse initialization; Image recognition; ALGORITHMS;

D O I：

10.1016/j.engappai.2017.11.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The demand of fast clustering increases rapidly as we keep collecting tremendously large amount of data in the last decade. In this paper, we propose a nonparametric and representative-based Sparse Self-Represented Network Map for fast clustering on large dataset. Each node in the network generates a heat map for the dataset by receiving stimulations from data within its Accepting Field. We developed a weight adjusting method to learn and summarize the clustering pattern of the data. Such learned map is used for computing clustering results, by breaking weak links and finding connected components Rather than employing an iterative process to find local minima, our network passes the dataset only once and is able to capture the global pattern of the dataset as well as detecting natural number of clusters. As a nonparametric method, we propose Sparse Dynamic Instantiation to avoid the curse of dimensionality, namely a node or a link is instantiated only when stimulated by input data. As a result, the overall complexity is linear to the data dimension. Our algorithm is tested on synthetic and real datasets and compare with popular clustering algorithms (K-means++, Expectation Maximization, Mean Shift and StreamKM++) as well as state-of-art clustering algorithm (Affinity Propagation and Density Peak). We also applied our clustering algorithm to mobile location clustering, building a Visual Dictionary for image recognition, and clustering data streams. Our experiments indicate that our algorithm can be a better alternative for all compared popular clustering algorithms especially when efficiency is the primary consideration, namely we drastically improve time and space complexity but retain equal level of accuracy. (C) 2017 Elsevier Ltd. All rights reserved.

引用

页码：121 / 130

页数：10

共 18 条

[1] Density Based Self Organizing Incremental Neural Network For Data Stream Clustering
Xu, Baile
Shen, Furao
Zhao, Jinxi
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2654 - 2661
[2] Data summarization based fast hierarchical clustering method for large datasets
Patra, Bidyut Kr.
Nandi, Sukumar
Viswanath, P.
2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 278 - +
[3] A clustering method considering continuity of data based on self-organizing map
Imamura, Hiroki
Fujimura, Makoto
Kuroda, Hideo
Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2006, 60 (08): : 1312 - 1316
[4] Application of Self-organizing Feature Map Neural Network Based on Data Clustering
Hu, Xiang
Yang, Yun
Zhang, Lihong
Xiang, Tao
Hong, Chengqiu
Zheng, Xiaotong
PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 797 - 802
[5] A Novel Spatial Clustering Method based on Wavelet Network and Density Analysis for Data Stream
Xu, Chonghuan
JOURNAL OF COMPUTERS, 2013, 8 (08) : 2139 - 2143
[6] A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
Chen, Jin-Yin
He, Hui-Hao
INFORMATION SCIENCES, 2016, 345 : 271 - 293
[7] Data Clustering Mining Method of Social Network Talent Recruitment Stream Based on MST Algorithm
Li, Hongjian
Hu, Nan
ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2022, PT II, 2023, 469 : 99 - 111
[8] A density-based competitive data stream clustering network with self-adaptive distance metric
Xu, Baile
Shen, Furao
Zhao, Jinxi
NEURAL NETWORKS, 2019, 110 : 141 - 158
[9] Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
Viswanath, P.
Babu, V. Suresh
PATTERN RECOGNITION LETTERS, 2009, 30 (16) : 1477 - 1488
[10] A fast encryption method of large enterprise financial data based on adversarial neural network
Chu Y.
International Journal of Industrial and Systems Engineering, 2023, 44 (03) : 302 - 315

← 1 2 →