Semi-Supervised Semantic Dynamic Text Clustering Algorithm

被引:0
|
作者
Qian Z.-S. [1 ,2 ]
Huang R.-Z. [1 ,2 ]
Wei Q. [2 ]
Qin Y.-B. [1 ,2 ]
Chen Y.-P. [1 ,2 ]
机构
[1] School of Computer Science and Technology, Guizhou University, Guiyang
[2] Public Big Data Laboratory of Guizhou, Guizhou University, Guiyang
关键词
Dynamic text clustering; Semantic learning; Semi-supervised text clustering; Text clustering;
D O I
10.3969/j.issn.1001-0548.2019.06.001
中图分类号
学科分类号
摘要
In the traditional dynamic text clustering, the similar texts with different descriptions are divided into different groups; and the difference between the number of cluster categories and the number of real categories is obvious. Aiming at these problems, this paper proposes a semi-supervised semantic dynamic text clustering algorithm (SDCS). The algorithm captures the semantic relationship between texts by semantically representing the text, and dynamically learns the category semantics during the clustering process, so that the text can be accurately clustered according to semantics. At the same time, the algorithm uses the semi-supervised clustering algorithm to supervise the generation of new classes, and produces clustering results that are consistent with the actual situation. The experimental results show that the proposed algorithm is effective and feasible. © 2019, Editorial Board of Journal of the University of Electronic Science and Technology of China. All right reserved.
引用
收藏
页码:803 / 808
页数:5
相关论文
共 22 条
  • [1] Tian Z., Ramakrishnan R., Livny M., BIRCH: An efficient data clustering method for very large databases, ACM SIGMOD International Conference on Management of Data, pp. 103-114, (1996)
  • [2] Rodrigues P.P., Gama J., Pedroso J.P., ODAC: Hierarchical clustering of time series data streams, Proceedings of the 6th SIAM International Conference on Data Mining, pp. 615-627, (2006)
  • [3] Kranen P., Assent I., Baldauf C., Et al., The ClusTree: Indexing micro-clusters for anytime stream mining, Knowledge and Information Systems Journal, 29, 2, pp. 249-272, (2011)
  • [4] Iibrahim O.A., Du Y., Keller J., Robust on-line streaming clustering, International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 467-478, (2018)
  • [5] Guha S., Meyerson A., Mishra N., Et al., Clustering data streams: Theory and practice, IEEE Transactions on Knowledge and Data Engineering, 15, 3, pp. 515-528, (2003)
  • [6] Arthur D., Vassilvitskii S., K-means++: The advantages of careful seeding, Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035, (2007)
  • [7] Ackermann M.R., Lammersen C., Martens M., Et al., StreamKM++: A clustering algorithm for data streams, Journal of Experimental Algorithmics, 17, 1, pp. 173-187, (2012)
  • [8] Aggarwal C.C., Han J., Wang J., Et al., A framework for clustering evolving data streams, Proceedings of the 29th International Conference on very Large Data Bases, pp. 81-92, (2003)
  • [9] Bao J.P., Wang W.Q., Yang T.S., Et al., An incremental clustering method based on the boundary profile, PLOS ONE, 13, 4, (2018)
  • [10] Cao F., Estert M., Qian W., Et al., Density-based clustering over an evolving data stream with noise, Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328-339, (2006)