On the Limitations of Unsupervised Bilingual Dictionary Induction

被引:0
|
作者
Sogaard, Anders [1 ]
Ruder, Sebastian [2 ,3 ]
Vulic, Ivan [4 ]
机构
[1] Univ Copenhagen, Copenhagen, Denmark
[2] Natl Univ Ireland, Insight Res Ctr, Galway, Ireland
[3] Aylien Ltd, Dublin, Ireland
[4] Univ Cambridge, Language Technol Lab, Cambridge, England
基金
爱尔兰科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Unsupervised machine translation-i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora-seems impossible, but nevertheless, Lample et al. (2018a) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
引用
收藏
页码:778 / 788
页数:11
相关论文
共 50 条
  • [21] CONTRASTIVE DICTIONARY OF PORTUGUESE AND SPANISH (DICOPOES) IN THE BILINGUAL LEXICOGRAPHY OF PORTUGUESE AND SPANISH: CONTRIBUTIONS, LIMITATIONS AND EXPECTATIONS
    Sastre Ruano, Ma. Angeles
    [J]. CADERNOS DE TRADUCAO, 2013, 32 (02): : 39 - 56
  • [22] A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction
    Ren, Shuo
    Liu, Shujie
    Zhou, Ming
    Ma, Shuai
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3476 - 3485
  • [23] A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
    Nasution, Arbi Haza
    Murakami, Yohei
    Ishida, Toru
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (02)
  • [24] WHAT BELONGS IN A BILINGUAL DICTIONARY
    HAAS, MR
    [J]. INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 1962, 28 (02) : 45 - 50
  • [25] The intercultural dimension of the bilingual dictionary
    Farina, Annick
    [J]. JOURNAL OF FRENCH LANGUAGE STUDIES, 2018, 28 (03) : 457 - 459
  • [26] Bilingual Dictionary In The Linguocultural Aspect
    Mardanova, Gulnaz, I
    Karimullina, Guzel N.
    Karimullina, Rezeda N.
    Sarekenova, Karlygash K.
    [J]. MODERN JOURNAL OF LANGUAGE TEACHING METHODS, 2018, 8 (11): : 108 - 112
  • [27] How is the bilingual dictionary possible?
    Adamska-Salaciak, Arleta
    [J]. INTERNATIONAL JOURNAL OF LEXICOGRAPHY, 2008, 21 (04) : 439 - 446
  • [28] The bilingual dictionary: Friend or foe?
    Pastor, GC
    [J]. PROCEEDINGS OF THE XIXTH INTERNATIONAL CONFERENCE ON AEDEAN (ASOCIACION ESPANOLA DE ESTUDIOS ANGLONORTEAMERICANOS), 1996, : 201 - 204
  • [29] Lexibase Pro Bilingual Dictionary
    Pillet, S
    [J]. FRENCH REVIEW, 2004, 77 (06): : 1250 - 1251
  • [30] Bilingual Dictionary of Legal terminology
    Alcaraz-Varo, Enrique
    [J]. QUADERNS-REVISTA DE TRADUCCIO, 2006, 13 : 217 - 219