Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

被引:4
|
作者
Tang, Yachun [1 ]
机构
[1] Hunan Univ Sci & Engn, Coll Informat Engn, Yongzhou 425199, Peoples R China
基金
湖南省自然科学基金;
关键词
CLASSIFICATION;
D O I
10.1155/2022/4414207
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE+P, and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Word Embeddings Training Method Based on Modified Skip-Gram and Align
    Xing, Chang-shuai
    Zhou, Gang
    Lu, Ji-cang
    Zhang, Feng-juan
    CLOUD COMPUTING AND SECURITY, PT VI, 2018, 11068 : 334 - 346
  • [2] New Generation Model of Word Vector Representation Based on CBOW or Skip-Gram
    Xiong, Zeyu
    Shen, Qiangqiang
    Xiong, Yueshan
    Wang, Yijie
    Li, Weizi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (01): : 259 - 273
  • [3] Improved Skip-Gram Based on Graph Structure Information
    Wang, Xiaojie
    Zhao, Haijun
    Chen, Huayue
    SENSORS, 2023, 23 (14)
  • [4] Extrapolating human judgments from skip-gram vector representations of word meaning
    Hollis, Geoff
    Westbury, Chris
    Lefsrud, Lianne
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2017, 70 (08): : 1603 - 1619
  • [5] SubGram: Extending Skip-Gram Word Representation with Substrings
    Kocmi, Tom
    Bojar, Ondrej
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 182 - 189
  • [6] Skip-Gram - Zipf plus Uniform = Vector Additivity
    Gittens, Alex
    Achlioptas, Dimitris
    Mahoney, Michael W.
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 69 - 76
  • [7] Domain Named Entity Recognition Method Based on Skip-gram Model
    Feng Yan-hong
    Yu Hong
    Sun Geng
    Yu Xun-ran
    PROCEEDINGS FIRST INTERNATIONAL CONFERENCE ON ELECTRONICS INSTRUMENTATION & INFORMATION SYSTEMS (EIIS 2017), 2017, : 510 - 514
  • [8] The design of electronic medical records system using Skip-gram algorithm
    Yu, Tianjiao
    NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2021, 10 (01):
  • [9] The design of electronic medical records system using Skip-gram algorithm
    Tianjiao Yu
    Network Modeling Analysis in Health Informatics and Bioinformatics, 2021, 10
  • [10] SkipCas: Information Diffusion Prediction Model Based on Skip-Gram
    Ren, Dedong
    Liu, Yong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 258 - 273