GNEG: Graph-Based Negative Sampling for word2vec

被引:0
|
作者
Zhang, Zheng [1 ,2 ]
Zweigenbaum, Pierre [1 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, Orsay, France
[2] Univ Paris Saclay, CNRS, Univ Paris Sud, LRI, Orsay, France
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Negative sampling is an important component in word2vec for distributed word representation learning. We hypothesize that taking into account global, corpus-level information and generating a different noise distribution for each target word better satisfies the requirements of negative examples for each training word than the original frequency-based distribution. In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk. We test this hypothesis through a set of experiments whose results show that our approach boosts the word analogy task by about 5% and improves the performance on word similarity tasks by about 1% compared to the skip-gram negative sampling baseline.
引用
收藏
页码:566 / 571
页数:6
相关论文
共 50 条
  • [1] Service Discovery Method Based on Knowledge Graph and Word2vec
    Zhou, Junkai
    Jiang, Bo
    Yang, Jie
    Yang, Junchen
    Li, Hang
    Wang, Ning
    Wang, Jiale
    [J]. ELECTRONICS, 2022, 11 (16)
  • [2] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [3] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    [J]. 2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [4] Study on Tibetan Word Vector based on Word2vec
    Yang, Ning
    Li, Guanyu
    Ding, Hailan
    Gong, Chunwei
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [5] An Word2vec based on Chinese Medical Knowledge
    Zhu, Jiayi
    Ni, Pin
    Li, Yuming
    Peng, Junkun
    Dai, Zhenjin
    Le, Gangmin
    Bai, Xuming
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6263 - 6265
  • [6] ECG analysis based on Word2Vec model
    Oliinyk, Yurii
    Tereschenko, Andrii
    Baklan, Igor
    Beraudo, Elisa
    [J]. IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 213 - 222
  • [7] WEIGHTED WORD2VEC BASED ON THE DISTANCE OF WORDS
    Chang, Chia-Yang
    Lee, Shie-Jue
    Lai, Chih-Chin
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2017, : 563 - 568
  • [8] Keywords Extraction Based on Word2Vec and TextRank
    Zhang, Yong
    Chen, Fen
    Zhang, Wufeng
    Zuo, Haoyang
    Yu, Fangyuan
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON BIG DATA AND EDUCATION (ICBDE 2020), 2020, : 37 - 42
  • [9] The Spectral Underpinning of word2vec
    Jaffe, Ariel
    Kluger, Yuval
    Lindenbaum, Ofir
    Patsenker, Jonathan
    Peterfreund, Erez
    Steinerberger, Stefan
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 6
  • [10] Emerging Trends Word2Vec
    Church, Kenneth Ward
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) : 155 - 162