A Weighted Topical Document Embedding based Clustering Method for News Text

被引:0
|
作者
Zhu Dechao [1 ]
Song Hui [1 ]
机构
[1] Donghua Univ, Sch Comp Sci, Shanghai, Peoples R China
来源
2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC) | 2016年
关键词
Text Clustering; Skip-Gram; LDA; TF-IDF;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an unsupervised machine learning method, clustering can preliminarily group text without artificial labeling, which effectively accelerates the organization, abstraction and navigation on large news set. The length of news is long, and the text contains many homonymy and polysemy, that is one of the reason that traditional text clustering methods perform weaker on grouping news text. This paper presents a novel text representation method based on topical document embedding (TDE) to capture the semantic features of different topics. In TDE representation, document embedding of news texts is obtained by adding up word vector from Skip-Gram model weighted by TF-IDF score of all the key words in the text. While the topical document embedding is learned by joining the topic vectors obtained from LDA model and the document vectors in document embedding. By using topical document embedding to perform clustering, we implement a novel text clustering method (TDE-TC). The experimental results show that the effect of news clustering based on TDE representation is better than that of bag of words model and LDA model.
引用
收藏
页码:1060 / 1065
页数:6
相关论文
共 50 条
  • [41] A Document Clustering Method based on Hierarchical Algorithm with Model Clustering
    Sun, Haojun
    Liu, Zhihui
    Kong, Lingjun
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1229 - +
  • [42] Word Embedding of Dimensionality Reduction for Document Clustering
    Zhu, Pengyu
    Lang, Qi
    Liu, Xiaodong
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4371 - 4376
  • [43] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [44] An Ontology Learning Method Based on Document Clustering
    Wei, Xianmin
    FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE II, PTS 1-6, 2012, 121-126 : 1911 - 1915
  • [45] A String Kernel Based Method for Document Clustering
    Shi, Qingwei
    Wu, Rongteng
    2ND INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2010), VOLS 1 AND 2, 2010, : 526 - 529
  • [46] Document-based topic coherence measures for news media text
    Korencic, Damir
    Ristov, Strahil
    Snajder, Jan
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 357 - 373
  • [47] A Weighted Word Embedding Model for Text Classification
    Ren, Haopeng
    Zeng, ZeQuan
    Cai, Yi
    Du, Qing
    Li, Qing
    Xie, Haoran
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 419 - 434
  • [48] Performance Evaluation of Semantic Based and Ontology Based Text Document Clustering Techniques
    Punitha, S. C.
    Punithavalli, M.
    INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 100 - 106
  • [49] Knowledge-based Document Embedding for Cross-Domain Text Classification
    Li, Yiming
    Wei, Baogang
    Yao, Liang
    Chen, Hui
    Li, Zherong
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1395 - 1402
  • [50] Adaptive Centroid-based Clustering Algorithm for Text Document Data
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    Fu, Bo
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 63 - 68