GNEG: Graph-Based Negative Sampling for word2vec

被引:0
|
作者
Zhang, Zheng [1 ,2 ]
Zweigenbaum, Pierre [1 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, Orsay, France
[2] Univ Paris Saclay, CNRS, Univ Paris Sud, LRI, Orsay, France
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Negative sampling is an important component in word2vec for distributed word representation learning. We hypothesize that taking into account global, corpus-level information and generating a different noise distribution for each target word better satisfies the requirements of negative examples for each training word than the original frequency-based distribution. In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk. We test this hypothesis through a set of experiments whose results show that our approach boosts the word analogy task by about 5% and improves the performance on word similarity tasks by about 1% compared to the skip-gram negative sampling baseline.
引用
收藏
页码:566 / 571
页数:6
相关论文
共 50 条
  • [31] Drug-Target Binding Affinity Prediction Based on Graph Neural Networks and Word2vec
    Xia, Minghao
    Hu, Jing
    Zhang, Xiaolong
    Lin, Xiaoli
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 496 - 506
  • [32] Scaling Word2Vec on Big Corpus
    Bofang Li
    Aleksandr Drozd
    Yuhe Guo
    Tao Liu
    Satoshi Matsuoka
    Xiaoyong Du
    [J]. Data Science and Engineering, 2019, 4 : 157 - 175
  • [33] Scaling Word2Vec on Big Corpus
    Li, Bofang
    Drozd, Aleksandr
    Guo, Yuhe
    Liu, Tao
    Matsuoka, Satoshi
    Du, Xiaoyong
    [J]. DATA SCIENCE AND ENGINEERING, 2019, 4 (02) : 157 - 175
  • [34] Application of Word2vec in Phoneme Recognition
    Feng, Xin
    Wang, Lei
    [J]. ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 495 - 499
  • [35] Considerations about learning Word2Vec
    Di Gennaro, Giovanni
    Buonanno, Amedeo
    Palmieri, Francesco A. N.
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 12320 - 12335
  • [36] Considerations about learning Word2Vec
    Giovanni Di Gennaro
    Amedeo Buonanno
    Francesco A. N. Palmieri
    [J]. The Journal of Supercomputing, 2021, 77 : 12320 - 12335
  • [37] Acceleration of Word2vec Using GPUs
    Bae, Seulki
    Yi, Youngmin
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 269 - 279
  • [38] Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings
    Moon, Gordon E.
    Newman-Griffis, Denis
    Kim, Jinsung
    Sukumaran-Rajam, Aravind
    Fosler-Lussier, Eric
    Sadayappan, P.
    [J]. PROCEEDINGS OF 2019 5TH IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2019), 2019, : 44 - 55
  • [39] Application of Output Embedding on Word2Vec
    Uchida, Shuto
    Yoshikawa, Tomohiro
    Furuhashi, Takeshi
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1433 - 1436
  • [40] GraphTar: applying word2vec and graph neural networks to miRNA target prediction
    Jan Przybyszewski
    Maciej Malawski
    Sabina Lichołai
    [J]. BMC Bioinformatics, 24