On the Limitations of Unsupervised Bilingual Dictionary Induction

被引:0
|
作者
Sogaard, Anders [1 ]
Ruder, Sebastian [2 ,3 ]
Vulic, Ivan [4 ]
机构
[1] Univ Copenhagen, Copenhagen, Denmark
[2] Natl Univ Ireland, Insight Res Ctr, Galway, Ireland
[3] Aylien Ltd, Dublin, Ireland
[4] Univ Cambridge, Language Technol Lab, Cambridge, England
基金
爱尔兰科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Unsupervised machine translation-i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora-seems impossible, but nevertheless, Lample et al. (2018a) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
引用
收藏
页码:778 / 788
页数:11
相关论文
共 50 条
  • [1] Sub-word based unsupervised bilingual dictionary induction for Chinese-Uyghur
    Aysa, Anwar
    Ablimit, Mijit
    Yilahun, Hankiz
    Hamdulla, Askar
    [J]. 2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 476 - 481
  • [2] A Bilingual Adversarial Autoencoder for Unsupervised Bilingual Lexicon Induction
    Bai, Xuefeng
    Cao, Hailong
    Chen, Kehai
    Zhao, Tiejun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (10) : 1639 - 1648
  • [3] Bilingual Dictionary Induction as an Optimization Problem
    Wushouer, Mairidan
    Lin, Donghui
    Ishida, Toru
    Hirayama, Katsutoshi
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2122 - 2129
  • [4] Adversarial Training for Unsupervised Bilingual Lexicon Induction
    Zhang, Meng
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1959 - 1970
  • [5] Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction
    Cao, Hailong
    Zhao, Tiejun
    Wang, Weixuan
    Peng, Wei
    [J]. INFORMATION FUSION, 2023, 97
  • [6] Point Set Registration for Unsupervised Bilingual Lexicon Induction
    Cao, Hailong
    Zhao, Tiejun
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3991 - 3997
  • [7] An unsupervised method for ranking translation words using a bilingual dictionary and WordNet
    Kim, Kweon Yang
    Park, Se Young
    Hong, Dong Kwon
    [J]. ADVANCES IN APPLIED ARTICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4031 : 879 - 888
  • [8] Bilingual Lexicon Induction through Unsupervised Machine Translation
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5002 - 5007
  • [9] Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction
    Kementchedjhieva, Yova
    Hartmann, Mareike
    Sogaard, Anders
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3336 - 3341
  • [10] Topic-Based Unsupervised and Supervised Dictionary Induction
    Liu, Yuzhi
    Piccardi, Massimo
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)