Handling OOV Words in NMT Using Unsupervised Bilingual Embedding

被引:0
|
作者
Haddad, Hesam [1 ]
Fadaei, Hakimeh [1 ]
Faili, Heshaam [1 ]
机构
[1] Univ Tehran, Dept Elect & Comp Engn, Tehran, Iran
关键词
Machine Translation; Neural Machine Translation; Out-of-Vocabulary Words;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Neural machine translation has recently become the premier approach in Machine Translation however, it still has some unsolved issues. In this paper we have focused on handling the out-of-vocabulary (OOV) words as an open problem in neural machine translation. The method we introduce in this paper chooses appropriate alternative words inside the vocabulary for the OOV words by considering the word embeddings trained on monolingual corpora. Both monolingual and bilingual embeddings are used in finding the proper substitute for each OOV word. Using this technique we have improved the quality of translation up to 2.3 BLEU without using any additional annotated data.
引用
收藏
页码:569 / 574
页数:6
相关论文
共 50 条
  • [1] Confidence score based unsupervised incremental adaptation for OOV words detection
    Chu, Wei
    Xiao, Xi
    Liu, Jia
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 723 - 731
  • [2] SUBWORD-BASED MODELING FOR HANDLING OOV WORDS IN KEYWORD SPOTTING
    He, Yanzhang
    Hutchinson, Brian
    Baumann, Peter
    Ostendorf, Mari
    Fosler-Lussier, Eric
    Pierrehumbert, Janet
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Handling OOV Words In Arabic ASR Via Flexible Morphological Constraints
    Bach, Nguyen
    Noamany, Mohamed
    Lane, Ian
    Schultz, Tanja
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1057 - 1060
  • [4] An unsupervised method for ranking translation words using a bilingual dictionary and WordNet
    Kim, Kweon Yang
    Park, Se Young
    Hong, Dong Kwon
    [J]. ADVANCES IN APPLIED ARTICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4031 : 879 - 888
  • [5] Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction
    Cao, Hailong
    Zhao, Tiejun
    Wang, Weixuan
    Peng, Wei
    [J]. INFORMATION FUSION, 2023, 97
  • [6] Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1235 - 1245
  • [7] Adapting lexical representation and OOV handling from written to spoken language with word embedding
    Tafforeau, Jeremie
    Artieres, Thierry
    Favre, Benoit
    Bechet, Frederic
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1408 - 1412
  • [8] Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction
    Cao, Hailong
    Li, Liguo
    Zhu, Conghui
    Yang, Muyun
    Zhao, Tiejun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2606 - 2615
  • [9] Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN
    Alipour, Ghafour
    Bagherzadeh Mohasefi, Jamshid
    Feizi-Derakhshi, Mohammad-Reza
    [J]. Applied Artificial Intelligence, 2022, 36 (01):
  • [10] Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN
    Alipour, Ghafour
    Mohasefi, Jamshid Bagherzadeh
    Feizi-Derakhshi, Mohammad-Reza
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)