Intrinsic Evaluation of Lithuanian Word Embeddings Using WordNet

被引:4
|
作者
Kapociute-Dzikiene, Jurgita [1 ]
Damasevicius, Robertas [2 ]
机构
[1] Vytautas Magnus Univ, K Donelaicio 58, LT-44248 Kaunas, Lithuania
[2] Kaunas Univ Technol, K Donelaicio 73, LT-44029 Kaunas, Lithuania
关键词
Intrinsic evaluation; Neural word embeddings; The Lithuanian language;
D O I
10.1007/978-3-319-91189-2_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural network-based word embeddings -outperforming traditional approaches in the various Natural Language Processing tasks have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of similar to 234 thousand running words and produced several word embedding models: based on the continuous bagof-words and skip-gram architectures; softmax and negative sampling training algorithms; varied number of dimensions (100, 300, 500, and 1,000). Word embeddings were evaluated using the Lithuanian WordNet as the resource for the synonym search. We have determined the superiority of the continuous bag-of-words over the skip-gram architecture; while the training algorithm and dimensionality showed no significant impact on the results. Better results were achieved with the continuous bag-of-words, negative sampling and 1,000 dimensions.
引用
收藏
页码:394 / 404
页数:11
相关论文
共 50 条
  • [31] Domain Adaptation for Word Sense Disambiguation Using Word Embeddings
    Komiya, Kanako
    Suzuki, Shota
    Sasaki, Minoru
    Shinnou, Hiroyuki
    Okumura, Manabu
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 195 - 206
  • [32] Dense Embeddings Preserving the Semantic Relationships in WordNet
    Zhang, Canlin
    Liu, Xiuwen
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [33] Clustering WordNet word senses
    Agirre, E
    De Lacalle, OL
    [J]. RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 121 - 130
  • [35] Automatic keyphrase extraction using word embeddings
    Yuxiang Zhang
    Huan Liu
    Suge Wang
    W. H. Ip.
    Wei Fan
    Chunjing Xiao
    [J]. Soft Computing, 2020, 24 : 5593 - 5608
  • [36] Improving Word Embeddings Using Kernel PCA
    Gupta, Vishwani
    Giesselbach, Sven
    Rueping, Stefan
    Bauckhage, Christian
    [J]. 4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 200 - 208
  • [37] Network Intrusion Detection using Word Embeddings
    Zhuo, Xiaoyan
    Zhang, Jialing
    Son, Seung Woo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4686 - 4695
  • [38] Decoupled Word Embeddings using Latent Topics
    Park, Heesoo
    Lee, Jongwuk
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 875 - 882
  • [39] Explaining Topical Distances Using Word Embeddings
    Witt, Nils
    Seifert, Christin
    Granitzer, Michael
    [J]. 2016 27TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2016, : 212 - 217
  • [40] Domain Ontology Induction using Word Embeddings
    Gupta, Niharika
    Podder, Sanjay
    Annervaz, K. M.
    Sengupta, Shubhashis
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 115 - 119