Improving Word Embeddings for Low Frequency Words by Pseudo Contexts

被引:1
|
作者
Li, Fang [1 ]
Wang, Xiaojie [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp, Beijing, Peoples R China
关键词
Word embedding; Low frequency word;
D O I
10.1007/978-3-319-69005-6_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates relations between word semantic density and word frequency. A distributed representations based word average similarity is defined as the measure of word semantic density. We find that the average similarities of low frequency words are always bigger than that of high frequency words, when the frequency approaches to 400 around, the average similarity tends to stable. The finding keeps correct with changes of the size of training corpus, dimension of distributed representations and number of negative samples in skip-gram model. It also keeps on 17 different languages. Basing on the finding, we propose a pseudo context skip-gram model, which makes use of context words of semantic nearest neighbors of target words. Experiment results show our model achieves significant performance improvements in both word similarity and analogy tasks.
引用
收藏
页码:37 / 47
页数:11
相关论文
共 50 条
  • [41] Improving Word Recognition using Multiple Hypotheses and Deep Embeddings
    Bansal, Siddhant
    Krishnan, Praveen
    Jawahar, C., V
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9499 - 9506
  • [42] Improving Word Embeddings with Convolutional Feature Learning and Subword Information
    Cao, Shaosheng
    Lu, Wei
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3144 - 3151
  • [43] Improving POS Tagging Across Portuguese Variants with Word Embeddings
    Fonseca, Erick Rocha
    Aluisio, Sandra Maria
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 227 - 232
  • [44] Improving Relation Descriptor Extraction with Word Embeddings and Cluster Features
    Liu, Tao
    Li, Minghui
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1271 - 1275
  • [45] Improving Cross-Lingual Word Embeddings by Meeting in the Middle
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 294 - 304
  • [46] Improving bilingual word embeddings mapping with monolingual context information
    Zhu, Shaolin
    Mi, Chenggang
    Li, Tianqi
    Zhang, Fuhua
    Zhang, Zhifeng
    Sun, Yu
    [J]. MACHINE TRANSLATION, 2021, 35 (04) : 503 - 518
  • [47] Improving seller–customer communication process using word embeddings
    Malik Muhammad Saad Missen
    Aqsa Naeem
    Hina Asmat
    Nadeem Salamat
    Nadeem Akhtar
    Mickaël Coustaty
    V. B. Surya Prasath
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 2257 - 2272
  • [48] Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings
    Ljubesic, Nikola
    Fiser, Darja
    Peti-Stanti, Anita
    [J]. REPRESENTATION LEARNING FOR NLP, 2018, : 217 - 222
  • [49] Joint Model Using Character and Word Embeddings for Detecting Internet Slang Words
    Liu, Yihong
    Seki, Yohei
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, 13133 LNCS : 18 - 33
  • [50] Joint Model Using Character and Word Embeddings for Detecting Internet Slang Words
    Liu, Yihong
    Seki, Yohei
    [J]. TOWARDS OPEN AND TRUSTWORTHY DIGITAL SOCIETIES, ICADL 2021, 2021, 13133 : 18 - 33