Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness

被引：0

作者：

Luo, Mengting ^{[1
,2
]}

He, Linchao ^{[1
,2
]}

Guo, Mingyue ^{[1
,2
]}

Han, Fei ^{[1
,2
]}

Tian, Long ^{[1
,2
]}

Pu, Haibo ^{[1
,2
]}

Zhang, Dejun ^{[3
]}

机构：

[1] Sichuan Agr Univ, Lab Agr Informat Engn, Yaan 0086625014, Peoples R China

[2] Key Lab Agr Informat Engn Sichuan Prov, Yaan 0086625014, Peoples R China

[3] China Univ Geosci, Fac Informat Engn, Wuhan 0086430074, Hubei, Peoples R China

来源：

2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019) | 2019年 / 533卷

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1088/1757-899X/533/1/012051

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor (k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.

引用

页数：9

共 50 条

[31] Three algorithms for word-to-phrase machine translation
Le Manh Hai
Phan Thi Tuoi
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 328 - 331
[32] Word-level confidence estimation for machine translation
Ueffing, Nicola
Ney, Hermann
COMPUTATIONAL LINGUISTICS, 2007, 33 (01) : 9 - 40
[33] Measuring word alignment quality for statistical machine translation
Fraser, Alexander
Marcu, Daniel
COMPUTATIONAL LINGUISTICS, 2007, 33 (03) : 293 - 303
[34] A Novel Word Reordering Method for Statistical Machine Translation
Zang, Shuo
Zhao, Hai
Wu, Chunyang
Wang, Rui
2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 843 - 848
[35] Towards Understanding Neural Machine Translation with Word Importance
He, Shilin
Tu, Zhaopeng
Wang, Xing
Wang, Longyue
Lyu, Michael R.
Shi, Shuming
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 953 - 962
[36] Exploiting Rules for Word Sense Disambiguation in Machine Translation
Specia, Lucia
Nunes, Maria das Gracas V.
Stevenson, Mark
PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 171 - 178
[37] Improvements on automatic word codification for Connectionist machine translation
Casan, GA
Castaño, MA
ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 576 - 580
[38] A Cognitive Model of Chinese Word Segmentation for Machine Translation
Wu, Zhijie
META, 2011, 56 (03) : 631 - 644
[39] Better Addressing Word Deletion for Statistical Machine Translation
Li, Qiang
Zhang, Dongdong
Li, Mu
Xiao, Tong
Zhu, Jingbo
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 91 - 102
[40] An Experiment of Word Sense Disambiguation in a Machine Translation System
Faili, Heshaam
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 28 - 34

← 1 2 3 4 5 →