Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

被引:4
|
作者
Tang, Yachun [1 ]
机构
[1] Hunan Univ Sci & Engn, Coll Informat Engn, Yongzhou 425199, Peoples R China
基金
湖南省自然科学基金;
关键词
CLASSIFICATION;
D O I
10.1155/2022/4414207
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE+P, and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Attributed Network Embedding Using an Improved Weisfeiler-Lehman Schema and a Novel Deep Skip-Gram
    Al-Furas, Amr
    Alrahmawy, Mohammed F.
    Alblwi, Abdulaziz
    Al-Adrousy, Waleed Mohamed
    Elmougy, Samir
    IEEE ACCESS, 2023, 11 : 110102 - 110123
  • [22] MULTIMODAL REPRESENTATION: KNESER-NEY SMOOTHING/SKIP-GRAM BASED NEURAL LANGUAGE MODEL
    Song, Mingoo
    Yoo, Chang D.
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2281 - 2285
  • [23] Automated Text Analysis Based on Skip-Gram Model for Food Evaluation in Predicting Consumer Acceptance
    Kim, Augustine Yongwhi
    Ha, Jin Gwan
    Choi, Hoduk
    Moon, Hyeonjoon
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
  • [24] OntoPred: An Efficient Attention-Based Approach for Protein Function Prediction Using Skip-Gram Features
    Chintawar S.
    Kulkarni R.
    Patil N.
    SN Computer Science, 4 (5)
  • [25] Research on Improved Algorithm of PageRank Based on Vector Space
    Tan, Xiangwei
    Huang, Gengsheng
    Jiang, Huiyong
    2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 446 - 451
  • [26] An improved training algorithm for support vector machines
    Osuna, E
    Freund, R
    Girosi, F
    NEURAL NETWORKS FOR SIGNAL PROCESSING VII, 1997, : 276 - 285
  • [27] Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain
    Pang Baomao
    Shi Haoshan
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 236 - 238
  • [28] Entity Extraction for Malayalam Social Media Text using Structured Skip-gram based Embedding Features from Unlabeled Data
    Devi, Remmiya G.
    Veena, P. V.
    Kumar, Anand M.
    Soman, K. P.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS, 2016, 93 : 547 - 553
  • [29] Online Training Algorithm Research Based on Improved Weighted LSSVM
    Xing Jianchun
    Wang Ronghao
    Yang Qiliang
    Xiang Zhengrong
    Lu Juliang
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 5055 - 5060
  • [30] Improved Motion Estimation Algorithm Based on ME-Skip
    Liu, Zhifeng
    Jiang, Jinfeng
    Yang, Wentong
    Zhang, Aiping
    Wang, Jianhua
    2009 IEEE 10TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1-3: E-BUSINESS, CREATIVE DESIGN, MANUFACTURING - CAID&CD'2009, 2009, : 2061 - 2064