Reconstructed similarity for faster GANs-based word translation to mitigate hubness

被引：6

作者：

Zhang, Dejun ^{[1
]}

Luo, Mengting ^{[2
]}

He, Fazhi ^{[3
]}

机构：

[1] China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China

[2] Sichuan Agr Univ, Coll Informat & Engn, Yaan 625014, Peoples R China

[3] Wuhan Univ, Sch Comp, Wuhan 430072, Hubei, Peoples R China

来源：

NEUROCOMPUTING | 2019年 / 362卷

基金：

中国国家自然科学基金;

关键词：

Neural machine word translation; Bilingual word embeddings; Generative adversarial nets; Hubness problem; Reconstructed similarity; REPRESENTATION;

D O I：

10.1016/j.neucom.2019.06.082

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In machine word translation, Nearest Neighbor (NN) retrieval is able to search the best-k translation candidates as linguistic labels of a source query from a unified multilingual semantic feature space. However, NN is polluted by hubs in the high-dimensional feature space. Many proposed approaches remove hubs in the list of translation candidates to relieve this problem. But those approaches to eliminating hubs are flawed because they also have corresponding translations. To address this issue, we propose a novel Reconstructed Similarity (RS) retrieval for the neural machine word translation model to mitigate the hubness problem regardless of whether it is a hub. Different from previous work, RS reduces the impact of hubness pollution in dense and high-dimensional space and allows the hubs to have the same probability as the target candidates without being inappropriately excluded. In addition, RS improves the quality of bilingual dictionaries by measuring the bilateral similarity of the bilingual and monolingual distance of each of the source query embeddings. Additionally, to model the unsupervised machine word translation, we introduce Generative Adversarial Nets (GANs) to map the source and target word distribution into a shared semantic space. We also construct a tiny GAN topology for neural machine word translation, which is at least 52 x faster than previous GAN-based models. To further align cross-lingual embedding distributions, we provide orthogonal Procrustes mapping, global-awareness of the transformation matrix and rescaling of the target embeddings as flexible and optional multirefinements. The results show that our model outperforms the state-of-the-art by nearly 4% in distant languages such as English to Finnish. Compared with a precision@1 of 47.00% from English to Finnish, our model obtains a precision@1 of 47.53% and achieves state-of-the-art results in a fully unsupervised form. Moreover, our model achieves competitive results in the shortest time among GAN-based models, which easily trade offbetween speed and accuracy. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：83 / 93

页数：11

共 5 条

[1] Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness
Luo, Mengting
He, Linchao
Guo, Mingyue
Han, Fei
Tian, Long
Pu, Haibo
Zhang, Dejun
2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533
[2] Research on English-Chinese machine translation shift based on word vector similarity
Ma, Qingqing
ARTIFICIAL LIFE AND ROBOTICS, 2024, 29 (04) : 585 - 589
[3] Extended Word Similarity Based Clustering on Unsupervised PoS Induction to Improve English-Indonesian Statistical Machine Translation
Sujaini, Herry
Arman, Arry Akhmad
Purwarianti, Ayu
Kuspriyanto
2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[4] Research on Improved Sentence Similarity Calculation Method Based on Word2Vec and Synonym Table in Interactive Machine Translation
Tian Hongnan
Guo Xin
2021 5TH INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION SCIENCES (ICRAS 2021), 2021, : 255 - 261
[5] Evaluating the impact of some linguistic information on the performances of a similarity-based and translation-oriented Word-Sense disambiguation method
Rakho, Myriam
Constant, Matthieu
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,

← 1 →