Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model

被引:1
|
作者
Tang Huanling [1 ,2 ,3 ]
Zhu Hui [4 ]
Wei Hongmin [4 ]
Zheng Han [4 ]
Mao Xueli [4 ]
Lu Mingyu [5 ]
Guo Jin [6 ]
机构
[1] Shandong Technol & Business Univ, Sch Comp Sci & Technol, Yantai 264005, Peoples R China
[2] Coinnovat Ctr Shandong Coll & Univ Future Intelli, Yantai 264005, Peoples R China
[3] Shandong Technol & Business Univ, Key Lab Intelligent Informat Proc Univ Shandong, Yantai 264005, Peoples R China
[4] Shandong Technol & Business Univ, Sch Informat & Elect Engn, Yantai 264005, Peoples R China
[5] Dalian Maritime Univ, Informat Sci & Technol Coll, Dalian 116026, Peoples R China
[6] Liaoning Normal Univ, Sch Comp & Informat Technol, Dalian 116029, Peoples R China
基金
中国国家自然科学基金;
关键词
Supervised latent Dirichlet allocation; Semantic word vector; Word2vec; Word embedding; Semantic similarity; Text categorization;
D O I
10.23919/cje.2021.00.113
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To solve the problem of semantic loss in text representation, this paper proposes a new embedding method of word representation in semantic space called wt2svec based on supervised latent Dirichlet allocation (SLDA) and Word2vec. It generates the global topic embedding word vector utilizing SLDA which can discover the global semantic information through the latent topics on the whole document set. It gets the local semantic embedding word vector based on the Word2vec. The new semantic word vector is obtained by combining the global semantic information with the local semantic information. Additionally, the document semantic vector named doc2svec is generated. The experimental results on different datasets show that wt2svec model can obviously promote the accuracy of the semantic similarity of words, and improve the performance of text categorization compared with Word2vec.
引用
收藏
页码:647 / 654
页数:8
相关论文
共 50 条
  • [1] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    TANG Huanling
    ZHU Hui
    WEI Hongmin
    ZHENG Han
    MAO Xueli
    LU Mingyu
    GUO Jin
    [J]. Chinese Journal of Electronics, 2023, 32 (03) : 647 - 654
  • [2] Stability of Word Embeddings Using Word2Vec
    Chugh, Mansi
    Whigham, Peter A.
    Dick, Grant
    [J]. AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 812 - 818
  • [3] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [4] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    [J]. 2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [5] Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model
    Tezgider, Murat
    Yildiz, Beytullah
    Aydin, Galip
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [6] STMC: Semantic Tag Medical Concept using Word2Vec representation
    Martinez Soriano, Ignacio
    Castro Pena, Juan Luis
    [J]. 2018 31ST IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS 2018), 2018, : 393 - 398
  • [7] Word2vec Semantic Representation in Multilabel Classification for Indonesian News Article
    Rahmawati, Dyah
    Khodra, Masayu Leylia
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS - CONCEPTS, THEORY AND APPLICATION (ICAICTA), 2016,
  • [8] ECG analysis based on Word2Vec model
    Oliinyk, Yurii
    Tereschenko, Andrii
    Baklan, Igor
    Beraudo, Elisa
    [J]. IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 213 - 222
  • [9] Analysis of the Word2Vec Model for Semantic Similarities in Indonesian Words
    Manalu, Louisten Novandi T.
    Bijaksana, Moch Arif
    Suryani, Arie Ardiyanti
    [J]. 2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2019, : 363 - 367
  • [10] Word2vec's Distributed Word Representation for Hindi Word Sense Disambiguation
    Kumari, Archana
    Lobiyal, D. K.
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY (ICDCIT 2020), 2020, 11969 : 325 - 335