Short Text Online Clustering Based on Incremental Robust Nonnegative Matrix Factorization

被引:0
|
作者
He C.-B. [1 ]
Tang Y. [2 ]
Zhang Q. [3 ]
Liu S.-Y. [1 ]
Liu H. [2 ]
机构
[1] School of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, Guangdong
[2] School of Computer, South China Normal University, Guangzhou, 510631, Guangdong
[3] School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, Guangdong
来源
关键词
Incremental iterative update rules; L[!sub]2; 1[!/sub; norm; Online clustering; Robust nonnegative matrix factorization; Short text clustering;
D O I
10.3969/j.issn.0372-2112.2019.05.016
中图分类号
学科分类号
摘要
Clustering a large number of short texts in social media has great value in applications. However, short texts often have these characteristics: lots of noises, growing rapidly and massive data. Most existing short text clustering algorithms are not effectively enough to process such short texts. Aiming at this problem, we propose an algorithm of short text online clustering based on incremental robust nonnegative matrix factorization (STOCIRNMF). This algorithm uses NMF to build the short text clustering model and applies L2, 1 norm to devise its objective function for improving its robustness. Meanwhile, STOCIRNMF can cluster short texts incrementally by using incremental iterative update rules. We conduct extensive experiments on real Sohu news titles and Weibo datasets. The results show that STOCIRNMF not only has better performance of short text clustering than some representative algorithms, but also is very effective to detect micro blog's topics online. © 2019, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:1086 / 1093
页数:7
相关论文
共 13 条
  • [1] Hu Y.H., Chen Y.L., Chou H.L., Opinion mining from online hotel reviews-a text summarization approach, Information Processing & Management, 53, 2, pp. 436-449, (2017)
  • [2] Cigarran J., Angel C., Garcia-Serrano A., A step forward for topic detection in Twitter: an FCA-based approach, Expert Systems with Applications, 57, pp. 21-36, (2016)
  • [3] Huang F.-L., Li C.-X., Yuan C.-A., Et al., Mining sentiment for web short text based TSCM model, Acta Electronica Sinica, 44, 8, pp. 1887-1891, (2016)
  • [4] Zhang H., Zhong G.Q., Improving short text classification by learning vector representations of both words and hidden topics, Knowledge-Based Systems, 102, 15, pp. 76-86, (2016)
  • [5] Yin J.H., Wang J.Y., A dirichlet multinomial mixture model-based approach for short text clustering, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233-242, (2014)
  • [6] Yu Z., Wang H.X., Lin X.M., Et al., Understanding short texts through semantic enrichment and hashing, IEEE Transactions on Knowledge and Data Engineering, 28, 2, pp. 566-579, (2016)
  • [7] Lee D.D., Seung H.S., Algorithms for non-negative matrix factorization, Proceedings of 2000 Annual Conference on Neural Information Processing Systems, pp. 556-562, (2000)
  • [8] Zhang X.C., Zong L.L., Liu X.Y., Et al., Constrained clustering with nonnegative matrix factorization, IEEE Transactions on Neural Networks and Learning Systems, 27, 7, pp. 1514-1526, (2016)
  • [9] Bucak S.S., Gunsel B., Incremental subspace learning via non-negative matrix factorization, Pattern Recognition, 42, 5, pp. 788-797, (2009)
  • [10] Chen R.G., Li H., Online algorithm for foreground detection based on incremental nonnegative matrix factorization, Proceedings of the 2nd International Conference on Control, Automation and Robotics, pp. 312-317, (2016)