Short Text Clustering based on Word Semantic Graph with Word Embedding Model

被引:9
|
作者
Jinarat, Supakpong [1 ]
Manaskasemsak, Bundit [1 ]
Rungsawang, Arnon [1 ]
机构
[1] Kasetsart Univ, Mass Informat & Knowledge Engn Lab, Dept Comp Engn, Fac Engn, Bangkok, Thailand
关键词
word semantic; graph clustering; short text; word embedding;
D O I
10.1109/SCIS-ISIS.2018.00223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, a number of short messages or short text contents created on the Internet are rapidly increasing. Tasks to manipulate, analyze, and extract knowledge from them lead mining techniques such as text clustering to become more important. However, applying traditional text clustering algorithms which consider only common words or phrases to group short texts is inefficient due to the problem of sparsity. In this paper, we propose a new clustering technique, called word semantic graph clustering, based on the use of text concepts. We apply the word embedding model from Word2Vec to capture the semantic meaning of words and later construct semantic subgraphs in which those words represented as vertices are connected by some high semantic similarities. Finally, short text documents will be assigned to the same cluster if they contain at least one word belonging to the same semantic subgraph. Experimental results conducted on two real datasets show that the proposed approach outperforms the state-of-the-art text clustering algorithms. In addition, it can also produce more appropriate label for each cluster than the comparative algorithms do.
引用
下载
收藏
页码:1427 / 1432
页数:6
相关论文
共 50 条
  • [21] Information Retrieval Based on Word Semantic Clustering
    Chang, Chia-Yang
    Lin, Yan-Ting
    Lee, Shie-Jue
    Lai, Chih-Chin
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [22] A Custom Word Embedding Model for Clustering of Maintenance Records
    Bhardwaj, Abhijeet Sandeep
    Deep, Akash
    Veeramani, Dharmaraj
    Zhou, Shiyu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (02) : 816 - 826
  • [23] A Short Text Topic Model Based on Semantics and Word Expansion
    Li Zhen
    Shao Yabin
    Yang Ning
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 60 - 64
  • [24] WORD DISTRIBUTED REPRESENTATION BASED TEXT CLUSTERING
    Feng, Shan
    Liu, Ruifang
    Wang, Qinlong
    Shi, Ruisheng
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 389 - 393
  • [25] Graph and Centroid-based Word Clustering
    Thaiprayoon, Santipong
    Unger, Herwig
    Kubek, Mario
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168
  • [26] Automated Short-Answer Grading using Semantic Similarity based on Word Embedding
    Lubis, Fetty Fitriyanti
    Mutaqin
    Putri, Atina
    Waskita, Dana
    Sulistyaningtyas, Tri
    Arman, Arry Akhmad
    Rosmansyah, Yusep
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2021, 12 (03) : 571 - 581
  • [27] Measuring text similarity based on structure and word embedding
    Farouk, Mamdouh
    COGNITIVE SYSTEMS RESEARCH, 2020, 63 : 1 - 10
  • [28] Enhancing Semantic Word Representations by Embedding Deep Word Relationships
    Nugaliyadde, Anupiya
    Wong, Kok Wai
    Sohel, Ferdous
    Xie, Hong
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2019), 2019, : 82 - 87
  • [29] Word Embedding-Based Biomedical Text Summarization
    Rouane, Oussama
    Belhadef, Hacene
    Bouakkaz, Mustapha
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 288 - 297
  • [30] Study on the Chinese Word Semantic Relation Classification with Word Embedding
    Shijia, E.
    Jia, Shengbin
    Xiang, Yang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 849 - 855