A vector reconstruction based clustering algorithm particularly for large-scale text collection

被引：2

作者：

Liu, Ming ^{[1
,2
]}

Wu, Chong ^{[1
]}

Chen, Lei ^{[3
]}

机构：

[1] Sch Management, Harbin, Peoples R China

[2] Sch Comp Sci & Technol, Harbin, Peoples R China

[3] Beijing Normal Univ, Int Business Fac, Zhuhai, Peoples R China

来源：

NEURAL NETWORKS | 2015年 / 63卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Vector reconstruction; Large-scale text clustering; Partial tuning sub-process; Overall tuning sub-process; SELF-ORGANIZING MAPS; MUTUAL INFORMATION; WEIGHT; SELECTION; ENTROPY;

D O I：

10.1016/j.neunet.2014.10.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. (C) 2014 Elsevier Ltd. All rights reserved.

引用

页码：141 / 155

页数：15

共 50 条

[41] Large Scale Text Clustering Method Study Based on MapReduce
Sun, Zhanquan
Li, Feng
Zhao, Yanling
Song, Lifeng
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2015, 2015, 9377 : 365 - 372
[42] Large-Scale Spectral Clustering Based on Representative Points
Yang, Libo
Liu, Xuemei
Nie, Feiping
Liu, Mingtang
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
[43] Large-Scale Image Clustering Based on Camera Fingerprints
Lin, Xufeng
Li, Chang-Tsun
[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2017, 12 (04) : 793 - 808
[44] Large-scale spectral clustering based on pairwise constraints
Semertzidis, T.
Rafailidis, D.
Strintzis, M. G.
Daras, P.
[J]. INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (05) : 616 - 624
[45] Graph Clustering for Large-Scale Text-Mining of Brain Imaging Studies
Chawla, Manisha
Mesa, Mounika
Miyapuram, Krishna P.
[J]. PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 163 - 168
[46] Adaptive Weighted Clustering Algorithm for Large-Scale Satellite Cluster Network
Chen, Yu
Zhang, Yong
Chen, Shi
[J]. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2021, 41 (11): : 1188 - 1192
[47] Density Peaks Clustering Algorithm for Large-scale Data Based on Divide-and-Conquer Strategy
Wang, Yining
[J]. 2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 416 - 419
[48] LSC: A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs
Singhal, Love
Iyer, Mahesh A.
Adya, Saurabh
[J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[49] ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases
Loh, Woong-Kee
Moon, Yang-Sae
Ahn, Heejune
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10) : 2048 - 2051
[50] An Improved Affinity Propagation Clustering Algorithm for Large-scale Data Sets
Liu, Xiaonan
Yin, Meijuan
Luo, Junyong
Chen, Wuping
[J]. 2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 894 - 899

← 1 2 3 4 5 →