Adversarial training with Wasserstein distance for learning cross-lingual word embeddings

被引:4
|
作者
Li, Yuling [1 ]
Zhang, Yuhong [1 ]
Yu, Kui [2 ]
Hu, Xuegang [2 ]
机构
[1] Hefei Univ Technol, Hefei, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
基金
美国国家科学基金会;
关键词
Cross-lingual word embeddings; Generative adversarial networks; Noise; NETWORKS; SPACE;
D O I
10.1007/s10489-020-02136-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have managed to learn cross-lingual word embeddings in a completely unsupervised manner through generative adversarial networks (GANs). These GANs-based methods enable the alignment of two monolingual embedding spaces approximately, but the performance on the embeddings of low-frequency words (LFEs) is still unsatisfactory. The existing solution is to set up the low sampling rates for the embeddings of LFEs based on word-frequency information. However, such a solution has two shortcomings. First, this solution relies on the word-frequency information that is not always available in real scenarios. Second, the uneven sampling may cause the models to overlook the distribution information of LFEs, thereby negatively affecting their performance. In this study, we propose a novel unsupervised GANs-based method that effectively improves the quality of LFEs, circumventing the above two issues. Our method is based on the observation that LFEs tend to be densely clustered in the embedding space. In these dense embedding points, obtaining fine-grained alignment through adversarial training is difficult. We use this idea to introduce a noise function that can disperse the dense embedding points to a certain extent. In addition, we train a Wasserstein critic network to encourage the noise-adding embeddings and the original embeddings to have similar semantics. We test our approach on two common evaluation tasks, namely, bilingual lexicon induction and cross-lingual word similarity. Experimental results show that the proposed model has stronger or competitive performance compared with the supervised and unsupervised baselines.
引用
收藏
页码:7666 / 7678
页数:13
相关论文
共 50 条
  • [1] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Yuling Li
    Yuhong Zhang
    Kui Yu
    Xuegang Hu
    [J]. Applied Intelligence, 2021, 51 : 7666 - 7678
  • [2] Unsupervised cross-lingual word embeddings learning with adversarial training
    Li, Yuling
    Zhang, Yuhong
    Li, Peipei
    Hu, Xuegang
    [J]. 2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 150 - 156
  • [3] Multi-Adversarial Learning for Cross-Lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 463 - 472
  • [4] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [5] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [6] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [7] Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4419 - 4430
  • [8] Learning Tibetan-Chinese cross-lingual word embeddings
    Ma, Wei
    Yu, Hongzhi
    Zhao, Kun
    Zhao, Deshun
    [J]. 2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 49 - 53
  • [9] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [10] Interactive Refinement of Cross-Lingual Word Embeddings
    Yuan, Michelle
    Zhang, Mozhi
    Van Durme, Benjamin
    Findlater, Leah
    Boyd-Graber, Jordan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5984 - 5996