Feature Word Vector Based on Short Text Clustering

被引:0
|
作者
Liu, Xin [1 ]
Wang, Bo [1 ]
Xi, Yao-yi [2 ]
Mao, Er-song [1 ]
Ke, Sheng-cai [1 ]
Tang, Yong-wang [1 ]
机构
[1] PLA Informat Engn Univ, Sch Informat & Syst Engn, Zhengzhou, High Tech Zone, Peoples R China
[2] PLA Univ Foreign Language, Luoyang, Jianxi Zone, Peoples R China
关键词
Short text; Feature words; Word vectors; Similarity Calculation; Clustering;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A feature word vector based on short text clustering algorithm is proposed in this paper to solve the poor clustering of short text caused by sparse feature and quick updates of short text. Firstly, the formula for feature word extraction based on word part-of-speech (POS) weighting is defined and used to extract a feature word as short text. Secondly, the word vector that represents the semantics of the feature word was obtained through training in large-scale corpus with the Continuous Skip-gram Model. Finally, Word Mover's Distance (WMD) was used to calculate similarity of short texts for short text clustering in the hierarchical clustering algorithm. The evaluation of four testing datasets revealed that the proposed algorithm is significantly superior to traditional clustering algorithms, with a mean F value of 55.43% on average higher than the second best method.
引用
收藏
页码:533 / 545
页数:13
相关论文
共 50 条
  • [1] Method of Feature Reduction in Short Text Classification Based on Feature Clustering
    Li, Fangfang
    Yin, Yao
    Shi, Jinjing
    Mao, Xingliang
    Shi, Ronghua
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [2] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [3] Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections
    Akritidis, Leonidas
    Alamaniotis, Miltiadis
    Fevgas, Athanasios
    Bozanis, Panayiotis
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 813 - 820
  • [4] Short Text Clustering Algorithm Based on Frequent Closed Word Sets
    Jin, Chunxia
    Bai, Qiuchan
    [J]. 2019 12TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2019), 2019, : 267 - 270
  • [5] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [6] Short-Text Sentiment Analysis Based on Windowed Word Vector
    Zhao, Dongmei
    Shen, Yingli
    Shen, Yabo
    Ma, Yong
    Jin, Yun
    Li, Shidang
    Gu, Mingliang
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 168 - 174
  • [7] Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM
    Zhang, Peiyao
    Liu, Dongsu
    [J]. Data Analysis and Knowledge Discovery, 2019, 3 (03) : 95 - 101
  • [8] Structural Feature-based Event Clustering for Short Text Streams
    Sun, Zhengya
    Han, Jiuqi
    Hao, Hong-Wei
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3252 - 3257
  • [9] Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
    Wu, Di
    Yang, Ruixin
    Shen, Chao
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2021, 56 (01) : 1 - 23
  • [10] Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
    Di Wu
    Ruixin Yang
    Chao Shen
    [J]. Journal of Intelligent Information Systems, 2021, 56 : 1 - 23