A Survey of Neural Machine Translation

被引：0

作者：

Li Y.-C. ^{[1
,2
]}

Xiong D.-Y. ^{[1
]}

Zhang M. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu

[2] Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou

来源：

Zhang, Min (minzhang@suda.edu.cn) | 2018年 / Science Press卷 / 41期

基金：

中国国家自然科学基金;

关键词：

Attention mechanism; Machine translation; Machine translation evaluation; Neural machine translation; Recurrent neural network; Sequence-to-sequence model;

D O I：

10.11897/SP.J.1016.2018.02734

中图分类号：

学科分类号：

摘要：

Machine translation is a subfield of artificial intelligence and natural language processing that investigates transforming the source language into the target language. Neural machine translation is a recently proposed framework for machine translation based purely on sequence-to-sequence models, in which a large neural network is used to transform the source language sequence into the target language sequence, leading to a novel paradigm for machine translation. After years of development, NMT has gained rich results and gradually surpassed the statistical machine translation (SMT) method over various language pairs, becoming a new machine translation model with great potential. In this paper, we systematically describe the vanilla NMT model and the different types of NMT models according to the principles of classical NMT model, the common and shared problems of NMT model, the novel models and new architectures, and other classification systems. First, we introduce the Encoder-Decoder based NMT as well as the problems and challenges in the model. In the vanilla NMT model, the encoder, implemented by a recurrent neural network (RNN), reads an input sequence to produce a fixed-length vector, from which the decoder generates a sequence of target language words. The biggest issue in the vanilla NMT model is that a sentence of any length needs to be compressed into a fixed-length vector that may be losing important information of a sentence, which is a bottleneck in NMT. Next, we summarize the neural networks used in NMT, including RNNs, convolutional neural networks (CNN), long short-term memory (LSTM) neural networks, gated recurrent neural networks, neural Turing machines (NTM), and memory networks, et al. Then, this paper introduces the current research situation of NMT in detail, including the attention-based NMT through attention mechanism, which is designed to predict the soft alignment between the source language and the target language, thus has greatly improved the performance of NMT; the character-level NMT model, aiming to solve the problems in the word-level NMT model, including character-level translation, subword-level translation, et al.; the multilingual NMT, which has the ability to use a single NMT model to translate between multiple languages, including the one-to-many model, the many-to-one model and the many-to-many model; the problem of restriction in NMT, focusing on solving the very large target vocabulary in NMT, including the out-of-vocabulary (OOV) problems and how to address the long sentence problems in NMT; leveraging prior knowledge in NMT, for example, incorporating and effective utilization of the word reordering knowledge, the morphological features, the bilingual-dictionary, the syntactic information and the monolingual data into NMT; the low-resource NMT, which is a solution to the poor-resource training data conditions for some language pairs; the new paradigm for the NMT architectures, for example the multi-model NMT, the NMT model via non recurrent neural networks, and the advanced learning paradigm for NMT, such as generative adversarial networks (GAN) and reinforcement learning. Last, we summarize some successful evaluation methods of machine translation based purely on neural networks. Finally, the paper gives a future outlook on the development trend of NMT and summarizes the key challenges and possible solutions. © 2018, Science Press. All right reserved.

引用

页码：2734 / 2755

页数：21

共 120 条

[21] Zong C.-Q., Statistical Machine Translation, (2013)
[22] Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, (2014)
[23] Tu Z., Lu Z., Liu Y., Et al., Modeling coverage for neural machine translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(ACL 2016), pp. 76-85, (2016)
[24] Druck G., Ganchev K., Graca J., Rich prior knowledge in learning for NLP, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics(ACL 2011), pp. 1-57, (2011)
[25] Tu Z., Liu Y., Shang L., Et al., Neural machine translation with reconstruction, Proceedings of the 31st AAAI Conference on Artificial Intelligence(AAAI 2017), pp. 3097-3103, (2017)
[26] Cheng Y., Xu W., He Z., Et al., Semi-supervised learning for neural machine translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(ACL 2016), pp. 1965-1974, (2016)
[27] Zhang B., Xiong D., Su J., Variational neural machine translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP 2016), pp. 521-530, (2016)
[28] Wang M., Lu Z., Li H., Et al., Memory-enhanced decoder for neural machine translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP 2016), pp. 278-286, (2016)
[29] Elman J.L., Finding structure in time, Cognitive Science, 14, 2, pp. 179-211, (1990)
[30] Goodfellow I., Bengio Y., Courville A., Deep Learning, (2015)

← 1 2 3 4 5 6 7 8 9 10 →