A vector reconstruction based clustering algorithm particularly for large-scale text collection

被引：2

作者：

Liu, Ming ^{[1
,2
]}

Wu, Chong ^{[1
]}

Chen, Lei ^{[3
]}

机构：

[1] Sch Management, Harbin, Peoples R China

[2] Sch Comp Sci & Technol, Harbin, Peoples R China

[3] Beijing Normal Univ, Int Business Fac, Zhuhai, Peoples R China

来源：

NEURAL NETWORKS | 2015年 / 63卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Vector reconstruction; Large-scale text clustering; Partial tuning sub-process; Overall tuning sub-process; SELF-ORGANIZING MAPS; MUTUAL INFORMATION; WEIGHT; SELECTION; ENTROPY;

D O I：

10.1016/j.neunet.2014.10.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. (C) 2014 Elsevier Ltd. All rights reserved.

引用

页码：141 / 155

页数：15

共 50 条

[31] Fast and scalable support vector clustering for large-scale data analysis
Yuan Ping
Yun Feng Chang
Yajian Zhou
Ying Jie Tian
Yi Xian Yang
Zhili Zhang
[J]. Knowledge and Information Systems, 2015, 43 : 281 - 310
[32] Fast and scalable support vector clustering for large-scale data analysis
Ping, Yuan
Chang, Yun Feng
Zhou, Yajian
Tian, Ying Jie
Yang, Yi Xian
Zhang, Zhili
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (02) : 281 - 310
[33] Structure-based Clustering Algorithm for Model Reduction of Large-scale Network Systems
Niazi, Muhammad Umar B.
Chen, Xiaodong
Canudas-de-Wit, Carlos
Scherpen, Jacquelien M. A.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5038 - 5043
[34] A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering
Wang, Yanjie
Qian, Quan
[J]. IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 1213 - 1218
[35] CASS: A distributed network clustering algorithm based on structure similarity for large-scale network
Kim, Jungrim
Shin, Mincheol
Kim, Jeongwoo
Park, Chihyun
Lee, Sujin
Woo, Jaemin
Kim, Hyerim
Seo, Dongmin
Yu, Seokjong
Park, Sanghyun
[J]. PLOS ONE, 2018, 13 (10):
[36] A virtual circle-based clustering algorithm with mobility prediction in large-scale MANETs
Wang, GJ
Zhang, LF
Cao, JN
[J]. NETWORKING AND MOBILE COMPUTING, PROCEEDINGS, 2005, 3619 : 364 - 374
[37] Large-scale distributed PV cluster division based on Fast Unfolding clustering algorithm
Wang, Lei
Zhang, Fan
Kou, Lingfeng
Xu, Yihu
Hou, Xiaogang
[J]. Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2021, 42 (10): : 29 - 34
[38] ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications
Bahmani, Amir
Mueller, Frank
[J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 785 - 792
[39] W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis
Chen, Yaofeng
Zhang, Chunyang
Ye, Long
Peng, Xiaogang
Qiu, Meikang
Cao, Weipeng
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 528 - 539
[40] Large-Scale Urban Reconstruction with Tensor Clustering and Global Boundary Refinement
Poullis, Charalambos
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1132 - 1145

← 1 2 3 4 5 →