Learning Bilingual Lexicon for Low-Resource Language Pairs

被引:0
|
作者
Zhu, ShaoLin [1 ,2 ,3 ]
Li, Xiao [1 ,2 ]
Yang, YaTing [1 ,2 ]
Wang, Lei [1 ,2 ]
Mi, ChengGang [1 ,2 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[2] Key Lab Speech Language Informat Proc Xinjiang, Urumqi, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国科学院西部之光基金;
关键词
D O I
10.1007/978-3-319-73618-1_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 50 条
  • [31] GATITOS: Using a New Multilingual Lexicon for Low-resource Machine Translation
    Jones, Alex
    Caswell, Isaac
    Saxena, Ishank
    Firat, Orhan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 371 - 405
  • [32] Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
    Maimaiti, Mieradilijiang
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (01) : 150 - 163
  • [33] Evaluation of the morphological rules for the Tenyidie language: a low-resource language
    Angami, Teisovi
    Kevichusa-Ezung, Mimi
    Singh, Sanasam Ranbir
    Tuithung, Themrichon
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [34] Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
    Fang, Meng
    Cohn, Trevor
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 587 - 593
  • [35] Anchor-based Bilingual Word Embeddings for Low-Resource Languages
    Eder, Tobias
    Hangya, Viktor
    Fraser, Alexander
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 227 - 232
  • [36] Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks
    Dou, Zi-Yi
    Yu, Keyi
    Anastasopoulos, Antonios
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1192 - 1197
  • [37] Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
    Xu, Fan
    Dan, Yangjie
    Yan, Keyu
    Ma, Yong
    Wang, Mingwen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [38] Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children
    Getman, Yaroslav
    Phan, Nhan
    Al-Ghezi, Ragheb
    Voskoboinik, Ekaterina
    Singh, Mittul
    Grosz, Tamas
    Kurimo, Mikko
    Salvi, Giampiero
    Svendsen, Torbjorn
    Strombergsson, Sofia
    Smolander, Anna
    Ylinen, Sari
    IEEE ACCESS, 2023, 11 : 86025 - 86037
  • [39] Contrastive Learning for Morphological Disambiguation Using Large Language Models in Low-Resource Settings
    Tolegen, Gulmira
    Toleu, Alymzhan
    Mussabayev, Rustam
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [40] Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
    Ali, Aizaz
    Khan, Maqbool
    Khan, Khalil
    Khan, Rehan Ullah
    Aloraini, Abdulrahman
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 713 - 733