Evaluating Sub-word embeddings in cross-lingual models

被引:0
|
作者
Parizi, Ali Hakimi [1 ]
Cook, Paul [1 ]
机构
[1] Univ New Brunswick, Fredericton, NB, Canada
关键词
Cross-lingual Word Embeddings; Low-resource Languages; Morphologically-rich Languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual word embeddings create a shared space for embeddings in two languages, and enable knowledge to be transferred between languages for tasks such as bilingual lexicon induction. One problem, however, is out-of-vocabulary (OOV) words, for which no embeddings are available. This is particularly problematic for low-resource and morphologically-rich languages, which often have relatively high OOV rates. Approaches to learning sub-word embeddings have been proposed to address the problem of OOV words, but most prior work has not considered sub-word embeddings in cross-lingual models. In this paper, we consider whether sub-word embeddings can be leveraged to form cross-lingual embeddings for OOV words. Specifically, we consider a novel bilingual lexicon induction task focused on OOV words, for language pairs covering several language families. Our results indicate that cross-lingual representations for OOV words can indeed be formed from sub-word embeddings, including in the case of a truly low-resource morphologically-rich language.
引用
收藏
页码:2712 / 2719
页数:8
相关论文
共 50 条
  • [1] Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
    Boudreau, Jeremie
    Patra, Akankshya
    Suvarna, Ashima
    Cook, Paul
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2736 - 2745
  • [2] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [3] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [4] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [5] Cross-lingual Models of Word Embeddings: An Empirical Comparison
    Upadhyay, Shyam
    Faruqui, Manaal
    Dyer, Chris
    Roth, Dan
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1661 - 1670
  • [6] Evaluating a Joint Training Approach for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora on Lower-resource Languages
    Parizi, Ali Hakimi
    Cook, Paul
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 302 - 307
  • [7] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [8] Interactive Refinement of Cross-Lingual Word Embeddings
    Yuan, Michelle
    Zhang, Mozhi
    Van Durme, Benjamin
    Findlater, Leah
    Boyd-Graber, Jordan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5984 - 5996
  • [9] Cross-Lingual Word Embeddings for Turkic Languages
    Kuriyozov, Elmurod
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4054 - 4062
  • [10] Improving Cross-Lingual Word Embeddings by Meeting in the Middle
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 294 - 304