A Survey of Neural Machine Translation

被引:0
|
作者
Li Y.-C. [1 ,2 ]
Xiong D.-Y. [1 ]
Zhang M. [1 ]
机构
[1] School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu
[2] Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou
来源
Zhang, Min (minzhang@suda.edu.cn) | 2018年 / Science Press卷 / 41期
基金
中国国家自然科学基金;
关键词
Attention mechanism; Machine translation; Machine translation evaluation; Neural machine translation; Recurrent neural network; Sequence-to-sequence model;
D O I
10.11897/SP.J.1016.2018.02734
中图分类号
学科分类号
摘要
Machine translation is a subfield of artificial intelligence and natural language processing that investigates transforming the source language into the target language. Neural machine translation is a recently proposed framework for machine translation based purely on sequence-to-sequence models, in which a large neural network is used to transform the source language sequence into the target language sequence, leading to a novel paradigm for machine translation. After years of development, NMT has gained rich results and gradually surpassed the statistical machine translation (SMT) method over various language pairs, becoming a new machine translation model with great potential. In this paper, we systematically describe the vanilla NMT model and the different types of NMT models according to the principles of classical NMT model, the common and shared problems of NMT model, the novel models and new architectures, and other classification systems. First, we introduce the Encoder-Decoder based NMT as well as the problems and challenges in the model. In the vanilla NMT model, the encoder, implemented by a recurrent neural network (RNN), reads an input sequence to produce a fixed-length vector, from which the decoder generates a sequence of target language words. The biggest issue in the vanilla NMT model is that a sentence of any length needs to be compressed into a fixed-length vector that may be losing important information of a sentence, which is a bottleneck in NMT. Next, we summarize the neural networks used in NMT, including RNNs, convolutional neural networks (CNN), long short-term memory (LSTM) neural networks, gated recurrent neural networks, neural Turing machines (NTM), and memory networks, et al. Then, this paper introduces the current research situation of NMT in detail, including the attention-based NMT through attention mechanism, which is designed to predict the soft alignment between the source language and the target language, thus has greatly improved the performance of NMT; the character-level NMT model, aiming to solve the problems in the word-level NMT model, including character-level translation, subword-level translation, et al.; the multilingual NMT, which has the ability to use a single NMT model to translate between multiple languages, including the one-to-many model, the many-to-one model and the many-to-many model; the problem of restriction in NMT, focusing on solving the very large target vocabulary in NMT, including the out-of-vocabulary (OOV) problems and how to address the long sentence problems in NMT; leveraging prior knowledge in NMT, for example, incorporating and effective utilization of the word reordering knowledge, the morphological features, the bilingual-dictionary, the syntactic information and the monolingual data into NMT; the low-resource NMT, which is a solution to the poor-resource training data conditions for some language pairs; the new paradigm for the NMT architectures, for example the multi-model NMT, the NMT model via non recurrent neural networks, and the advanced learning paradigm for NMT, such as generative adversarial networks (GAN) and reinforcement learning. Last, we summarize some successful evaluation methods of machine translation based purely on neural networks. Finally, the paper gives a future outlook on the development trend of NMT and summarizes the key challenges and possible solutions. © 2018, Science Press. All right reserved.
引用
收藏
页码:2734 / 2755
页数:21
相关论文
共 120 条
  • [1] Jiao L.-C., Yang S.-Y., Liu F., Et al., Seventy years beyond neural networks: Retrospect and prospect, Chinese Journal of Computers, 39, 8, pp. 1697-1716, (2016)
  • [2] Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation, 18, pp. 1527-1554, (2006)
  • [3] Krizhevsky A., Sutskever I., Hinton G.E., ImageNet classification with deep convolutional neural networks, Proceedings of the Neural Information Processing Systems(NIPS 2012), pp. 1097-1105, (2012)
  • [4] Hinton G., Deng L., Yu D., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 6, pp. 82-97, (2012)
  • [5] Collobert R., Weston J., Bottou L., Et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, 1, pp. 2493-2537, (2011)
  • [6] Junczys-Dowmunt M., Dwojak T., Hoang H., Is neural machine translation ready for deployment? A case study on 30 translation directions, (2016)
  • [7] Sennrich R., Haddow B., Birch A., Edinburgh neural machine translation systems for WMT 16, Proceedings of the 1st Conference on Machine Translation, pp. 371-376, (2016)
  • [8] Zhou J., Cao Y., Wang X., Et al., Deep recurrent models with fast-forward connections for neural machine translation, Transactions of the Association for Computational Linguistics, 4, pp. 371-383, (2016)
  • [9] Wu Y., Schuster M., Chen Z., Et al., Google's neural machine translation system: Bridging the gap between human and machine translation, (2016)
  • [10] Crego J., Kim J., Klein G., Et al., SYSTRAN's pure neural machine translation systems, (2016)