Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness

被引:0
|
作者
Luo, Mengting [1 ,2 ]
He, Linchao [1 ,2 ]
Guo, Mingyue [1 ,2 ]
Han, Fei [1 ,2 ]
Tian, Long [1 ,2 ]
Pu, Haibo [1 ,2 ]
Zhang, Dejun [3 ]
机构
[1] Sichuan Agr Univ, Lab Agr Informat Engn, Yaan 0086625014, Peoples R China
[2] Key Lab Agr Informat Engn Sichuan Prov, Yaan 0086625014, Peoples R China
[3] China Univ Geosci, Fac Informat Engn, Wuhan 0086430074, Hubei, Peoples R China
来源
2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019) | 2019年 / 533卷
基金
中国国家自然科学基金;
关键词
D O I
10.1088/1757-899X/533/1/012051
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor (k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Explicitly Modeling Word Translations in Neural Machine Translation
    Han, Dong
    Li, Junhui
    Li, Yachao
    Zhang, Min
    Zhou, Guodong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [42] HMM word and phrase alignment for statistical machine translation
    Deng, Yonggang
    Byrne, William
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 494 - 507
  • [43] Source-Word Decomposition for Neural Machine Translation
    Thien Nguyen
    Hoai Le
    Van-Huy Pham
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [44] Word Sense Disambiguation in English to Hindi Machine Translation
    Kolte, S. G.
    Bhirud, S. G.
    SOFTWARE AND COMPUTER APPLICATIONS, 2011, 9 : 285 - 290
  • [45] BPE beyond Word Boundary: How NOT to use Multi Word Expressions in Neural Machine Translation
    Kumar, Dipesh
    Thawani, Avijit
    PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 172 - 179
  • [46] Semantic access in number word translation - The role of crosslingual lexical similarity
    Duyck, Wouter
    Brysbaert, Marc
    EXPERIMENTAL PSYCHOLOGY, 2008, 55 (02) : 102 - 112
  • [47] Integration of speech recognition and machine translation: Speech recognition word lattice translation
    Zhang, RQ
    Kikui, G
    SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
  • [48] Machine translation and human translation of multi-word expressions: peeling this pineapple
    Rebechi, Rozane Rodrigues
    Marcon, Nathalia Oliva
    Faller, Guilherme de Almeida
    REVISTA VIRTUAL DE ESTUDOS DA LINGUAGEM-REVEL, 2025, 23 (44): : 346 - 380
  • [49] Improving Word Alignment for Statistical Machine Translation based on Constraints
    Le Quang Hung
    Le Anh Cuong
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 113 - 116
  • [50] Accurate Word Alignment Induction from Neural Machine Translation
    Chen, Yun
    Liu, Yang
    Chen, Guanhua
    Jiang, Xin
    Liu, Qun
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 566 - 576