Improving bilingual word embeddings mapping with monolingual context information

被引:1
|
作者
Zhu, Shaolin [2 ]
Mi, Chenggang [1 ]
Li, Tianqi [2 ]
Zhang, Fuhua [2 ]
Zhang, Zhifeng [2 ]
Sun, Yu [2 ]
机构
[1] Northwestern Polytech Univ, Xian 710129, Peoples R China
[2] Zhengzhou Univ Light Ind, Zhengzhou 450002, Peoples R China
基金
中国国家自然科学基金;
关键词
Bilingual word embeddings; Low-resource; Unsupervised emthod;
D O I
10.1007/s10590-021-09274-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train BWEs are based on bilingual supervision. However, bilingual resources are not available for many low-resource language pairs. Although some studies addressed this issue with unsupervised methods, monolingual contextual data are not used to improve the performance of low-resource BWEs. To address these issues, we propose an unsupervised method to improve BWEs using optimized monolingual context information without any parallel corpora. In particular, we first build a bilingual word embeddings mapping model between two languages by aligning monolingual word embedding spaces based on unsupervised adversarial training. To further improve the performance of these mappings, we use monolingual context information to optimize them during the course. Experimental results show that our method outperforms other baseline systems significantly, including results for four low-resource language pairs.
引用
收藏
页码:503 / 518
页数:16
相关论文
共 50 条
  • [21] Dependency Based Bilingual word Embeddings without word alignment
    Alqaisi, Taghreed
    Komninos, Alexandros
    O'Keefe, Simon
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [22] Unsupervised Joint Training of Bilingual Word Embeddings
    Marie, Benjamin
    Fujita, Atsushi
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3224 - 3230
  • [23] Statistical models for monolingual and bilingual information retrieval
    Bertoldi, N
    Federico, M
    [J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 53 - 72
  • [24] Statistical Models for Monolingual and Bilingual Information Retrieval
    Nicola Bertoldi
    Marcello Federico
    [J]. Information Retrieval, 2004, 7 : 53 - 72
  • [25] Word reading in monolingual and bilingual children with developmental language disorder
    de Bree, Elise H.
    Boerma, Tessel
    Hakvoort, Britt
    Blom, Elma
    van den Boer, Madelon
    [J]. LEARNING AND INDIVIDUAL DIFFERENCES, 2022, 98
  • [26] Improving Unsupervised Acoustic Word Embeddings using Speaker and Gender Information
    van Staden, Lisa
    Kamper, Herman
    [J]. 2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 533 - 538
  • [27] The effect of speaker reliability on word learning in monolingual and bilingual children
    Gangopadhyay, Ishanti
    Kaushanskaya, Margarita
    [J]. COGNITIVE DEVELOPMENT, 2022, 64
  • [28] Putting Mutual Exclusivity in Context: Speaker Race Influences Monolingual and Bilingual Infants' Word-Learning Assumptions
    Weatherhead, Drew
    Kandhadai, Padmapriya
    Hall, D. Geoffrey
    Werker, Janet F.
    [J]. CHILD DEVELOPMENT, 2021, 92 (05) : 1735 - 1751
  • [29] Learning Bilingual Word Embeddings Using Lexical Definitions
    Shi, Weijia
    Chen, Muhao
    Tian, Yingtao
    Chang, Kai-Wei
    [J]. 4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 142 - 147
  • [30] On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
    Vulic, Ivan
    Korhonen, Anna
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 247 - 257