Unsupervised Cross-lingual Transfer of Word Embedding Spaces

被引:0
|
作者
Xu, Ruochen [1 ]
Yang, Yiming [1 ]
Otani, Naoki [1 ]
Wu, Yuexin [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual transfer of word embeddings aims to establish the semantic mappings among words in different languages by learning the transformation functions over the corresponding word embedding spaces. Successfully solving this problem would benefit many downstream tasks such as to translate text classification models from resource-rich languages (e.g. English) to low-resource languages. Supervised methods for this problem rely on the availability of cross-lingual supervision, either using parallel corpora or bilingual lexicons as the labeled data for training, which may not be available for many low resource languages. This paper proposes an unsupervised learning approach that does not require any cross-lingual labeled data. Given two monolingual word embedding spaces for any language pair, our algorithm optimizes the transformation functions in both directions simultaneously based on distributional matching as well as minimizing the back-translation losses. We use a neural network implementation to calculate the Sinkhorn distance, a well-defined distributional similarity measure, and optimize our objective through back-propagation. Our evaluation on benchmark datasets for bilingual lexicon induction and cross-lingual word similarity prediction shows stronger or competitive performance of the proposed method compared to other state-of-the-art supervised and unsupervised baseline methods over many language pairs.
引用
收藏
页码:2465 / 2474
页数:10
相关论文
共 50 条
  • [1] Unsupervised Cross-Lingual Mapping for Phrase Embedding Spaces
    Ayana, Abraham G.
    Cao, Hailong
    Zhao, Tiejun
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 512 - 524
  • [2] A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces
    Tao, Qian
    Xiong, Zhihao
    Han, Bocheng
    Fan, Xiaoyang
    Li, Lusi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3027 - 3041
  • [3] On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4013 - 4023
  • [4] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [5] A survey of cross-lingual word embedding models
    Ruder, Sebastian
    Vulić, Ivan
    Søgaard, Anders
    [J]. Journal of Artificial Intelligence Research, 2019, 65 : 569 - 631
  • [6] A Survey of Cross-lingual Word Embedding Models
    Ruder, Sebastian
    Vulic, Ivan
    Sogaard, Anders
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 65 : 569 - 630
  • [7] Incorporating Word Embedding into Cross-lingual Topic Modeling
    Chang, Chia-Hsuan
    Hwang, San-Yih
    Xui, Tou-Hsiang
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 17 - 24
  • [8] Analyzing the Limitations of Cross-lingual Word Embedding Mappings
    Ormazabal, Aitor
    Artetxe, Mikel
    Labaka, Gorka
    Soroa, Aitor
    Agirre, Eneko
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4990 - 4995
  • [9] Unsupervised cross-lingual model transfer for named entity recognition with contextualized word representations
    Yan, Huijiong
    Qian, Tao
    Xie, Liang
    Chen, Shanguang
    [J]. PLOS ONE, 2021, 16 (09):
  • [10] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
    Plucinski, Kamil
    Lango, Mateusz
    Zimniewicz, Michal
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562