Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness

被引:0
|
作者
Luo, Mengting [1 ,2 ]
He, Linchao [1 ,2 ]
Guo, Mingyue [1 ,2 ]
Han, Fei [1 ,2 ]
Tian, Long [1 ,2 ]
Pu, Haibo [1 ,2 ]
Zhang, Dejun [3 ]
机构
[1] Sichuan Agr Univ, Lab Agr Informat Engn, Yaan 0086625014, Peoples R China
[2] Key Lab Agr Informat Engn Sichuan Prov, Yaan 0086625014, Peoples R China
[3] China Univ Geosci, Fac Informat Engn, Wuhan 0086430074, Hubei, Peoples R China
来源
2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019) | 2019年 / 533卷
基金
中国国家自然科学基金;
关键词
D O I
10.1088/1757-899X/533/1/012051
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor (k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] On the Word Alignment from Neural Machine Translation
    Li, Xintong
    Li, Guanlin
    Liu, Lemao
    Meng, Max
    Shi, Shuming
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1293 - 1303
  • [22] Content Word Aware Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 358 - 364
  • [23] Machine translation using word connectivity relationships
    Hokkaido Inst of Technology, Sapporo, Japan
    Syst Comput Jpn, 1600, 8 (74-83):
  • [24] Classification approach to word selection in machine translation
    Lee, HK
    MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 114 - 123
  • [25] Semantic graph for word disambiguation in machine translation
    Cohen, Fernand S.
    Zhong, Zheng
    Li, Chenxi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (30) : 43485 - 43502
  • [26] Word Position Aware Translation Memory for Neural Machine Translation
    He, Qiuxiang
    Huang, Guoping
    Liu, Lemao
    Li, Li
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 367 - 379
  • [27] Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings
    Nishikawa, Sosuke
    Ri, Ryokan
    Tsuruoka, Yoshimasa
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 163 - 173
  • [28] Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
    Joulin, Armand
    Bojanowski, Piotr
    Mikolov, Tomas
    Jegou, Herve
    Grave, Edouard
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2979 - 2984
  • [29] Addressing the Rare Word Problem in Neural Machine Translation
    Minh-Thang Luong
    Sutskever, Ilya
    Le, Quoc V.
    Vinyals, Oriol
    Zaremba, Wojciech
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 11 - 19
  • [30] A Smaller and Better Word Embedding for Neural Machine Translation
    Chen, Qi
    IEEE ACCESS, 2023, 11 : 40770 - 40778