A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages

被引:2
|
作者
Khatri, Jyotsana [1 ]
Murthy, Rudra [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
D O I
10.1145/3371158.3371219
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cross-lingual word embeddings have become ubiquitous for various NLP tasks. Existing literature primarily evaluate the quality of cross-lingual word embeddings on the task of Bilingual Lexicon Induction. They report very high accuracies for European languages. In this paper, we report the accuracy of Bilingual Lexicon Induction (BLI) task for cross-lingual word embeddings generated using two mapping based unsupervised approaches: VecMap and MUSE for Indian languages on a dataset created using linked Indian Wordnet. We also show the comparison of these approaches with a simple baseline where the embeddings for all languages are trained using fast-text on the combined corpora of 11 Indian languages. Our experiments show that existing cross-lingual word embedding approaches give low accuracy on bilingual lexicon induction for cognate words. Given the high cognate overlap of several Indian languages, this is a serious limitation of existing approaches.
引用
收藏
页码:347 / 348
页数:2
相关论文
共 50 条
  • [1] Cross-Lingual Word Embeddings for Turkic Languages
    Kuriyozov, Elmurod
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4054 - 4062
  • [2] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [3] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [4] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [5] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [6] Interactive Refinement of Cross-Lingual Word Embeddings
    Yuan, Michelle
    Zhang, Mozhi
    Van Durme, Benjamin
    Findlater, Leah
    Boyd-Graber, Jordan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5984 - 5996
  • [7] Improving Cross-Lingual Word Embeddings by Meeting in the Middle
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 294 - 304
  • [8] Delexicalized Word Embeddings for Cross-lingual Dependency Parsing
    Dehouck, Mathieu
    Denis, Pascal
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 241 - 250
  • [9] Data Filtering using Cross-Lingual Word Embeddings
    Herold, Christian
    Rosendahl, Jan
    Vanvinckenroye, Joris
    Ney, Hermann
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 162 - 172
  • [10] Cross-lingual Models of Word Embeddings: An Empirical Comparison
    Upadhyay, Shyam
    Faruqui, Manaal
    Dyer, Chris
    Roth, Dan
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1661 - 1670