Short Text Classification Based on Wikipedia and Word2vec

被引:0
|
作者
Liu Wensen [1 ]
Cao Zewen [1 ]
Wang Jun [1 ]
Wang Xiaoyi [2 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Informat Syst Engn Lab, Changsha, Hunan, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
short text; classification; wikipedia; Word2vec; semantic relatedness;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Different from long texts, the features of Chinese short texts is much sparse, which is the primary cause of the low accuracy in the classification of short texts by using traditional classification methods. In this paper, a novel method was proposed to tackle the problem by expanding the features of short text based on Wikipedia and Word2vec. Firstly, build the semantic relevant concept sets of Wikipedia. We get the articles that have high relevancy with Wikipedia concepts and use the word2vec tools to measure the semantic relatedness between target concepts and related concepts. And then we use the relevant concept sets to extend the short texts. Compared to traditional similarity measurement between concepts using statistical method, this method can get more accurate semantic relatedness. The experimental results show that by expanding the features of short texts, the classification accuracy can be improved. Specifically, our method appeared to be more effective.
引用
收藏
页码:1195 / 1200
页数:6
相关论文
共 50 条
  • [21] A text retrieval algorithm based on the hybrid LDA and Word2Vec model
    Mu, Xue
    [J]. 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2019, : 373 - 376
  • [22] Research on Semantic Prediction Analysis of Tibetan Text Based on Word2Vec
    Ding Hai-lan
    Yu Hong-zhi
    Qi Kun-yu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [23] Arabic Text Keywords Extraction using Word2vec
    Suleiman, Dima
    Awajan, Arafat A.
    Al Etaiwi, Wael
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 251 - 257
  • [24] Using Word2Vec to Process Big Text Data
    Ma, Long
    Zhang, Yanqing
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2895 - 2897
  • [25] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Aparna Sunil Kale
    Vinay Pandya
    Fabio Di Troia
    Mark Stamp
    [J]. Journal of Computer Virology and Hacking Techniques, 2023, 19 : 1 - 16
  • [26] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Kale, Aparna Sunil
    Pandya, Vinay
    Di Troia, Fabio
    Stamp, Mark
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (01) : 1 - 16
  • [27] Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
    Qiao, Yanchen
    Zhang, Weizhe
    Du, Xiaojiang
    Guizani, Mohsen
    [J]. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2022, 22 (01)
  • [28] Multi-Label Chinese Question Classification Based on Word2vec
    Fan, Zhengyu
    Su, Lei
    Liu, Xi
    Wang, Shuaiyang
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 546 - 550
  • [29] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [30] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    [J]. 2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521