Short Text Clustering based on Word Semantic Graph with Word Embedding Model

被引:9
|
作者
Jinarat, Supakpong [1 ]
Manaskasemsak, Bundit [1 ]
Rungsawang, Arnon [1 ]
机构
[1] Kasetsart Univ, Mass Informat & Knowledge Engn Lab, Dept Comp Engn, Fac Engn, Bangkok, Thailand
关键词
word semantic; graph clustering; short text; word embedding;
D O I
10.1109/SCIS-ISIS.2018.00223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, a number of short messages or short text contents created on the Internet are rapidly increasing. Tasks to manipulate, analyze, and extract knowledge from them lead mining techniques such as text clustering to become more important. However, applying traditional text clustering algorithms which consider only common words or phrases to group short texts is inefficient due to the problem of sparsity. In this paper, we propose a new clustering technique, called word semantic graph clustering, based on the use of text concepts. We apply the word embedding model from Word2Vec to capture the semantic meaning of words and later construct semantic subgraphs in which those words represented as vertices are connected by some high semantic similarities. Finally, short text documents will be assigned to the same cluster if they contain at least one word belonging to the same semantic subgraph. Experimental results conducted on two real datasets show that the proposed approach outperforms the state-of-the-art text clustering algorithms. In addition, it can also produce more appropriate label for each cluster than the comparative algorithms do.
引用
收藏
页码:1427 / 1432
页数:6
相关论文
共 50 条
  • [1] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [2] Text Semantic Steganalysis Based on Word Embedding
    Zuo, Xin
    Hu, Huanhuan
    Zhang, Weiming
    Yu, Nenghai
    [J]. CLOUD COMPUTING AND SECURITY, PT IV, 2018, 11066 : 485 - 495
  • [3] Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence
    Jin, Chun-Xia
    Bai, Qiu-Chan
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 497 - 502
  • [4] Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification
    Wang, Peng
    Xu, Bo
    Xu, Jiaming
    Tian, Guanhua
    Liu, Cheng-Lin
    Hao, Hongwei
    [J]. NEUROCOMPUTING, 2016, 174 : 806 - 814
  • [5] News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model
    Xiong, Ao
    Liu, Derong
    Tian, Hongkang
    Liu, Zhengyuan
    Yu, Peng
    Kadoch, Michel
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (06) : 886 - 893
  • [6] News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model
    Ao Xiong
    Derong Liu
    Hongkang Tian
    Zhengyuan Liu
    Peng Yu
    Michel Kadoch
    [J]. Tsinghua Science and Technology, 2021, 26 (06) : 886 - 893
  • [7] Feature Word Vector Based on Short Text Clustering
    Liu, Xin
    Wang, Bo
    Xi, Yao-yi
    Mao, Er-song
    Ke, Sheng-cai
    Tang, Yong-wang
    [J]. COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 533 - 545
  • [8] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [9] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    [J]. Applied Intelligence, 2022, 52 : 17829 - 17844
  • [10] Mixed Word Embedding Method Based on Knowledge Graph Augment for Text Classification
    Wang, Hongzhong
    Guo, Kun
    Liu, Zhanghui
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1618 - 1623