Machine translation is a subfield of artificial intelligence and natural language processing that investigates transforming the source language into the target language. Neural machine translation is a recently proposed framework for machine translation based purely on sequence-to-sequence models, in which a large neural network is used to transform the source language sequence into the target language sequence, leading to a novel paradigm for machine translation. After years of development, NMT has gained rich results and gradually surpassed the statistical machine translation (SMT) method over various language pairs, becoming a new machine translation model with great potential. In this paper, we systematically describe the vanilla NMT model and the different types of NMT models according to the principles of classical NMT model, the common and shared problems of NMT model, the novel models and new architectures, and other classification systems. First, we introduce the Encoder-Decoder based NMT as well as the problems and challenges in the model. In the vanilla NMT model, the encoder, implemented by a recurrent neural network (RNN), reads an input sequence to produce a fixed-length vector, from which the decoder generates a sequence of target language words. The biggest issue in the vanilla NMT model is that a sentence of any length needs to be compressed into a fixed-length vector that may be losing important information of a sentence, which is a bottleneck in NMT. Next, we summarize the neural networks used in NMT, including RNNs, convolutional neural networks (CNN), long short-term memory (LSTM) neural networks, gated recurrent neural networks, neural Turing machines (NTM), and memory networks, et al. Then, this paper introduces the current research situation of NMT in detail, including the attention-based NMT through attention mechanism, which is designed to predict the soft alignment between the source language and the target language, thus has greatly improved the performance of NMT; the character-level NMT model, aiming to solve the problems in the word-level NMT model, including character-level translation, subword-level translation, et al.; the multilingual NMT, which has the ability to use a single NMT model to translate between multiple languages, including the one-to-many model, the many-to-one model and the many-to-many model; the problem of restriction in NMT, focusing on solving the very large target vocabulary in NMT, including the out-of-vocabulary (OOV) problems and how to address the long sentence problems in NMT; leveraging prior knowledge in NMT, for example, incorporating and effective utilization of the word reordering knowledge, the morphological features, the bilingual-dictionary, the syntactic information and the monolingual data into NMT; the low-resource NMT, which is a solution to the poor-resource training data conditions for some language pairs; the new paradigm for the NMT architectures, for example the multi-model NMT, the NMT model via non recurrent neural networks, and the advanced learning paradigm for NMT, such as generative adversarial networks (GAN) and reinforcement learning. Last, we summarize some successful evaluation methods of machine translation based purely on neural networks. Finally, the paper gives a future outlook on the development trend of NMT and summarizes the key challenges and possible solutions. © 2018, Science Press. All right reserved.