Handling OOV Words in NMT Using Unsupervised Bilingual Embedding

被引:0
|
作者
Haddad, Hesam [1 ]
Fadaei, Hakimeh [1 ]
Faili, Heshaam [1 ]
机构
[1] Univ Tehran, Dept Elect & Comp Engn, Tehran, Iran
关键词
Machine Translation; Neural Machine Translation; Out-of-Vocabulary Words;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Neural machine translation has recently become the premier approach in Machine Translation however, it still has some unsolved issues. In this paper we have focused on handling the out-of-vocabulary (OOV) words as an open problem in neural machine translation. The method we introduce in this paper chooses appropriate alternative words inside the vocabulary for the OOV words by considering the word embeddings trained on monolingual corpora. Both monolingual and bilingual embeddings are used in finding the proper substitute for each OOV word. Using this technique we have improved the quality of translation up to 2.3 BLEU without using any additional annotated data.
引用
收藏
页码:569 / 574
页数:6
相关论文
共 50 条
  • [21] Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search
    He, Yanzhang
    Baumann, Peter
    Fang, Hao
    Hutchinson, Brian
    Jaech, Aaron
    Ostendorf, Mari
    Fosler-Lussier, Eric
    Pierrehumbert, Janet
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 79 - 92
  • [22] Unsupervised Motion Segmentation Using Metric Embedding of Features
    Osmanlioglu, Yusuf
    Dickinson, Sven
    Shokoufandeh, Ali
    [J]. SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 133 - 145
  • [23] Unsupervised Clustering of Human Pose Using Spectral Embedding
    Haseeb, Muhammad
    Hancock, Edwin R.
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 467 - 473
  • [24] Unsupervised Dynamic Network Embedding Using Global Information
    Zhu, Junyou
    Luo, Zheng
    Zhang, Fan
    Wang, Haiqiang
    Wang, Jiaxin
    Gao, Chao
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] An unsupervised & statistical word sense tagging using bilingual sources
    Oliveira, F
    Wong, F
    Li, YP
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3749 - 3754
  • [26] Bilingual Automatic Text Summarization Using Unsupervised Deep Learning
    Singh, Shashi Pal
    Kumar, Ajai
    Mangal, Abhilasha
    Singhal, Shikha
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 1195 - 1200
  • [27] Unsupervised bilingual word sense disambiguation using Web statistics
    Wang, Y
    Hoffmann, A
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 1167 - 1172
  • [28] Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations
    Veira, Neil
    Keng, Brian
    Padmanabhan, Kanchana
    Veneris, Andreas
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5218 - 5225
  • [29] Unsupervised word-sense disambiguation using bilingual comparable corpora
    Kaji, H
    Morimoto, Y
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02) : 289 - 301
  • [30] Unsupervised Confidence Calibration Using Examples of Recognized Words and Their Contexts
    Asami, Taichi
    Kobashikawa, Satoshi
    Masataki, Hirokazu
    Yoshioka, Osamu
    Takahashi, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2216 - 2220