Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model

被引:0
|
作者
TANG Huanling [1 ,2 ,3 ]
ZHU Hui [4 ]
WEI Hongmin [4 ]
ZHENG Han [4 ]
MAO Xueli [4 ]
LU Mingyu [5 ]
GUO Jin [6 ]
机构
[1] School of Computer Science and Technology, Shandong Technology and Business University
[2] Co-innovation Center of Shandong Colleges and Universities: Future Intelligent Computing
[3] Key Laboratory of Intelligent Information Processing in Universities of Shandong,Shandong Technology and Business University
[4] School of Information and Electronic Engineering, Shandong Technology and Business University
[5] Information Science and Technology College, Dalian Maritime University
[6] School of Computer and Information Technology, Liaoning Normal University
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
To solve the problem of semantic loss in text representation, this paper proposes a new embedding method of word representation in semantic space called wt2svec based on supervised latent Dirichlet allocation(SLDA) and Word2vec. It generates the global topic embedding word vector utilizing SLDA which can discover the global semantic information through the latent topics on the whole document set. It gets the local semantic embedding word vector based on the Word2vec. The new semantic word vector is obtained by combining the global semantic information with the local semantic information. Additionally, the document semantic vector named doc2svec is generated. The experimental results on different datasets show that wt2svec model can obviously promote the accuracy of the semantic similarity of words,and improve the performance of text categorization compared with Word2vec.
引用
收藏
页码:647 / 654
页数:8
相关论文
共 50 条
  • [1] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    Tang Huanling
    Zhu Hui
    Wei Hongmin
    Zheng Han
    Mao Xueli
    Lu Mingyu
    Guo Jin
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 647 - 654
  • [2] Stability of Word Embeddings Using Word2Vec
    Chugh, Mansi
    Whigham, Peter A.
    Dick, Grant
    [J]. AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 812 - 818
  • [3] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [4] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    [J]. 2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [5] Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model
    Tezgider, Murat
    Yildiz, Beytullah
    Aydin, Galip
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [6] STMC: Semantic Tag Medical Concept using Word2Vec representation
    Martinez Soriano, Ignacio
    Castro Pena, Juan Luis
    [J]. 2018 31ST IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS 2018), 2018, : 393 - 398
  • [7] Word2vec Semantic Representation in Multilabel Classification for Indonesian News Article
    Rahmawati, Dyah
    Khodra, Masayu Leylia
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS - CONCEPTS, THEORY AND APPLICATION (ICAICTA), 2016,
  • [8] ECG analysis based on Word2Vec model
    Oliinyk, Yurii
    Tereschenko, Andrii
    Baklan, Igor
    Beraudo, Elisa
    [J]. IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 213 - 222
  • [9] Analysis of the Word2Vec Model for Semantic Similarities in Indonesian Words
    Manalu, Louisten Novandi T.
    Bijaksana, Moch Arif
    Suryani, Arie Ardiyanti
    [J]. 2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2019, : 363 - 367
  • [10] Word2vec's Distributed Word Representation for Hindi Word Sense Disambiguation
    Kumari, Archana
    Lobiyal, D. K.
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY (ICDCIT 2020), 2020, 11969 : 325 - 335