Reconstructed similarity for faster GANs-based word translation to mitigate hubness

被引:6
|
作者
Zhang, Dejun [1 ]
Luo, Mengting [2 ]
He, Fazhi [3 ]
机构
[1] China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China
[2] Sichuan Agr Univ, Coll Informat & Engn, Yaan 625014, Peoples R China
[3] Wuhan Univ, Sch Comp, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Neural machine word translation; Bilingual word embeddings; Generative adversarial nets; Hubness problem; Reconstructed similarity; REPRESENTATION;
D O I
10.1016/j.neucom.2019.06.082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In machine word translation, Nearest Neighbor (NN) retrieval is able to search the best-k translation candidates as linguistic labels of a source query from a unified multilingual semantic feature space. However, NN is polluted by hubs in the high-dimensional feature space. Many proposed approaches remove hubs in the list of translation candidates to relieve this problem. But those approaches to eliminating hubs are flawed because they also have corresponding translations. To address this issue, we propose a novel Reconstructed Similarity (RS) retrieval for the neural machine word translation model to mitigate the hubness problem regardless of whether it is a hub. Different from previous work, RS reduces the impact of hubness pollution in dense and high-dimensional space and allows the hubs to have the same probability as the target candidates without being inappropriately excluded. In addition, RS improves the quality of bilingual dictionaries by measuring the bilateral similarity of the bilingual and monolingual distance of each of the source query embeddings. Additionally, to model the unsupervised machine word translation, we introduce Generative Adversarial Nets (GANs) to map the source and target word distribution into a shared semantic space. We also construct a tiny GAN topology for neural machine word translation, which is at least 52 x faster than previous GAN-based models. To further align cross-lingual embedding distributions, we provide orthogonal Procrustes mapping, global-awareness of the transformation matrix and rescaling of the target embeddings as flexible and optional multirefinements. The results show that our model outperforms the state-of-the-art by nearly 4% in distant languages such as English to Finnish. Compared with a precision@1 of 47.00% from English to Finnish, our model obtains a precision@1 of 47.53% and achieves state-of-the-art results in a fully unsupervised form. Moreover, our model achieves competitive results in the shortest time among GAN-based models, which easily trade offbetween speed and accuracy. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:83 / 93
页数:11
相关论文
共 5 条
  • [1] Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness
    Luo, Mengting
    He, Linchao
    Guo, Mingyue
    Han, Fei
    Tian, Long
    Pu, Haibo
    Zhang, Dejun
    2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533
  • [2] Research on English-Chinese machine translation shift based on word vector similarity
    Ma, Qingqing
    ARTIFICIAL LIFE AND ROBOTICS, 2024, 29 (04) : 585 - 589
  • [3] Extended Word Similarity Based Clustering on Unsupervised PoS Induction to Improve English-Indonesian Statistical Machine Translation
    Sujaini, Herry
    Arman, Arry Akhmad
    Purwarianti, Ayu
    Kuspriyanto
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [4] Research on Improved Sentence Similarity Calculation Method Based on Word2Vec and Synonym Table in Interactive Machine Translation
    Tian Hongnan
    Guo Xin
    2021 5TH INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION SCIENCES (ICRAS 2021), 2021, : 255 - 261
  • [5] Evaluating the impact of some linguistic information on the performances of a similarity-based and translation-oriented Word-Sense disambiguation method
    Rakho, Myriam
    Constant, Matthieu
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,