Cross-Lingual Word Embeddings for Turkic Languages

被引:0
|
作者
Kuriyozov, Elmurod [1 ]
Doval, Yerai [2 ]
Gomez-Rodriguez, Carlos [1 ]
机构
[1] Univ A Coruna, Fac Informat, Dept Comp & Tecnol Informac, CITIC,Grp LYS, Campus Elvina, La Coruna 15071, Spain
[2] Univ Vigo, Dept Informat & ES Enxenaria Informat, Grp COLE, Campus Lagoas, Orense 32004, Spain
基金
欧洲研究理事会;
关键词
Less-Resourced/Endangered Languages; Multilinguality;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
There has been an increasing interest in learning cross-lingual word embeddings to transfer knowledge obtained from a resource-rich language, such as English, to lower-resource languages for which annotated data is scarce, such as Turkish, Russian, and many others. In this paper, we present the first viability study of established techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh and Kyrgyz, members of the Turkic family which is heavily affected by the low-resource constraint. Those techniques are known to require little explicit supervision, mainly in the form of bilingual dictionaries, hence being easily adaptable to different domains, including low-resource ones. We obtain new bilingual dictionaries and new word embeddings for these languages and show the steps for obtaining cross-lingual word embeddings using state-of-the-art techniques. Then, we evaluate the results using the bilingual dictionary induction task. Our experiments confirm that the obtained bilingual dictionaries outperform previously-available ones, and that word embeddings from a low-resource language can benefit from resource-rich closely-related languages when they are aligned together. Furthermore, evaluation on an extrinsic task (Sentiment analysis on Uzbek) proves that monolingual word embeddings can, although slightly, benefit from cross-lingual alignments.
引用
收藏
页码:4054 / 4062
页数:9
相关论文
共 50 条
  • [1] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [2] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [3] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [4] A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages
    Khatri, Jyotsana
    Murthy, Rudra
    Bhattacharyya, Pushpak
    [J]. PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 347 - 348
  • [5] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [6] Interactive Refinement of Cross-Lingual Word Embeddings
    Yuan, Michelle
    Zhang, Mozhi
    Van Durme, Benjamin
    Findlater, Leah
    Boyd-Graber, Jordan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5984 - 5996
  • [7] Improving Cross-Lingual Word Embeddings by Meeting in the Middle
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 294 - 304
  • [8] Delexicalized Word Embeddings for Cross-lingual Dependency Parsing
    Dehouck, Mathieu
    Denis, Pascal
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 241 - 250
  • [9] Data Filtering using Cross-Lingual Word Embeddings
    Herold, Christian
    Rosendahl, Jan
    Vanvinckenroye, Joris
    Ney, Hermann
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 162 - 172
  • [10] Cross-lingual Models of Word Embeddings: An Empirical Comparison
    Upadhyay, Shyam
    Faruqui, Manaal
    Dyer, Chris
    Roth, Dan
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1661 - 1670