Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness

被引：0

作者：

Luo, Mengting ^{[1
,2
]}

He, Linchao ^{[1
,2
]}

Guo, Mingyue ^{[1
,2
]}

Han, Fei ^{[1
,2
]}

Tian, Long ^{[1
,2
]}

Pu, Haibo ^{[1
,2
]}

Zhang, Dejun ^{[3
]}

机构：

[1] Sichuan Agr Univ, Lab Agr Informat Engn, Yaan 0086625014, Peoples R China

[2] Key Lab Agr Informat Engn Sichuan Prov, Yaan 0086625014, Peoples R China

[3] China Univ Geosci, Fac Informat Engn, Wuhan 0086430074, Hubei, Peoples R China

来源：

2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019) | 2019年 / 533卷

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1088/1757-899X/533/1/012051

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor (k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.

引用

页数：9

共 50 条

[41] Explicitly Modeling Word Translations in Neural Machine Translation
Han, Dong
Li, Junhui
Li, Yachao
Zhang, Min
Zhou, Guodong
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
[42] HMM word and phrase alignment for statistical machine translation
Deng, Yonggang
Byrne, William
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 494 - 507
[43] Source-Word Decomposition for Neural Machine Translation
Thien Nguyen
Hoai Le
Van-Huy Pham
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[44] Word Sense Disambiguation in English to Hindi Machine Translation
Kolte, S. G.
Bhirud, S. G.
SOFTWARE AND COMPUTER APPLICATIONS, 2011, 9 : 285 - 290
[45] BPE beyond Word Boundary: How NOT to use Multi Word Expressions in Neural Machine Translation
Kumar, Dipesh
Thawani, Avijit
PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 172 - 179
[46] Semantic access in number word translation - The role of crosslingual lexical similarity
Duyck, Wouter
Brysbaert, Marc
EXPERIMENTAL PSYCHOLOGY, 2008, 55 (02) : 102 - 112
[47] Integration of speech recognition and machine translation: Speech recognition word lattice translation
Zhang, RQ
Kikui, G
SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
[48] Machine translation and human translation of multi-word expressions: peeling this pineapple
Rebechi, Rozane Rodrigues
Marcon, Nathalia Oliva
Faller, Guilherme de Almeida
REVISTA VIRTUAL DE ESTUDOS DA LINGUAGEM-REVEL, 2025, 23 (44): : 346 - 380
[49] Improving Word Alignment for Statistical Machine Translation based on Constraints
Le Quang Hung
Le Anh Cuong
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 113 - 116
[50] Accurate Word Alignment Induction from Neural Machine Translation
Chen, Yun
Liu, Yang
Chen, Guanhua
Jiang, Xin
Liu, Qun
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 566 - 576

← 1 2 3 4 5 →