Considerations about learning Word2Vec

被引:31
|
作者
Di Gennaro, Giovanni [1 ]
Buonanno, Amedeo [2 ]
Palmieri, Francesco A. N. [1 ]
机构
[1] Univ Campania Luigi Vanvitelli, Dipartimento Ingn, Via Roma 29, I-81031 Aversa, CE, Italy
[2] ENEA, Dept Energy Technol & Renewable Energy Sources, Res Ctr Portici, PE Fermi 1, Portici, NA, Italy
来源
JOURNAL OF SUPERCOMPUTING | 2021年 / 77卷 / 11期
关键词
Word embedding; Natural language processing; Neural networks; MEMORY;
D O I
10.1007/s11227-021-03743-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the large diffusion and use of embedding generated through Word2Vec, there are still many open questions about the reasons for its results and about its real capabilities. In particular, to our knowledge, no author seems to have analysed in detail how learning may be affected by the various choices of hyperparameters. In this work, we try to shed some light on various issues focusing on a typical dataset. It is shown that the learning rate prevents the exact mapping of the co-occurrence matrix, that Word2Vec is unable to learn syntactic relationships, and that it does not suffer from the problem of overfitting. Furthermore, through the creation of an ad-hoc network, it is also shown how it is possible to improve Word2Vec directly on the analogies, obtaining very high accuracy without damaging the pre-existing embedding. This analogy-enhanced Word2Vec may be convenient in various NLP scenarios, but it is used here as an optimal starting point to evaluate the limits of Word2Vec.
引用
收藏
页码:12320 / 12335
页数:16
相关论文
共 50 条
  • [1] Considerations about learning Word2Vec
    Giovanni Di Gennaro
    Amedeo Buonanno
    Francesco A. N. Palmieri
    [J]. The Journal of Supercomputing, 2021, 77 : 12320 - 12335
  • [2] The Spectral Underpinning of word2vec
    Jaffe, Ariel
    Kluger, Yuval
    Lindenbaum, Ofir
    Patsenker, Jonathan
    Peterfreund, Erez
    Steinerberger, Stefan
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 6
  • [3] Emerging Trends Word2Vec
    Church, Kenneth Ward
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) : 155 - 162
  • [4] Stability of Word Embeddings Using Word2Vec
    Chugh, Mansi
    Whigham, Peter A.
    Dick, Grant
    [J]. AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 812 - 818
  • [5] Word2vec for Arabic Word Sense Disambiguation
    Laatar, Rim
    Aloulou, Chafik
    Belghuith, Lamia Hadrich
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 308 - 311
  • [6] PTPD: predicting therapeutic peptides by deep learning and word2vec
    Wu, Chuanyan
    Gao, Rui
    Zhang, Yusen
    De Marinis, Yang
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [7] The new deep learning architecture based on GRU and word2vec
    Atassi, Abdelhamid
    El Azami, Ikram
    Sadiq, Abdelalim
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, CONTROL, OPTIMIZATION AND COMPUTER SCIENCE (ICECOCS), 2018,
  • [8] Classification Turkish SMS with Deep Learning Tool Word2Vec
    Karasoy, Onur
    Balli, Serkan
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 294 - 297
  • [9] PTPD: predicting therapeutic peptides by deep learning and word2vec
    Chuanyan Wu
    Rui Gao
    Yusen Zhang
    Yang De Marinis
    [J]. BMC Bioinformatics, 20
  • [10] Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model
    Tezgider, Murat
    Yildiz, Beytullah
    Aydin, Galip
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,