Reinforcement learning is a widely studied class of machine learning method, where the agent of reinforcement learning keeps continuously interacting with the environment with the goal of getting maximal long term return. Reinforcement learning is particularly prominent in areas such as control and optimal scheduling. Deep reinforcement learning, which is able to take large-scale high-dimensional data, e.g. video and image, as original input data, takes advantage of deep learning methods to extract abstract representations of them, and then utilizes reinforcement learning methods to attain optimal strategies, has recently become a research hotspot in artificial intelligence. There has emerged a large amount of work on deep reinforcement learning. For example, deep Q network(DQN), one of the most famous models in deep reinforcement learning, is based on convolutional neural networks (CNNs) and Q-learning algorithm, directly uses the unprocessed image as the input. DQN has been applied to learn strategy in complex environments with high-dimensional input. However, few deep reinforcement learning algorithms considers how to ensure security during the process of learning in the unknown environment. Even more, many reinforcement learning algorithms intentionally add random exploration approaches, e.g. ε-greedy, to guarantee the diversity of data sampling so that the algorithm could obtain a better approximate optimal solution. Nevertheless, exploration without any security constraint is very dangerous and likely to bring with high risk of leading to disastrous results. Aiming at solving this problem, an algorithm, named dual deep network based secure deep reinforcement learning (DDN-SDRL), is proposed. The DDN-SDRL algorithm sets up two experience pools. The first one is the experience pool of dangerous samples, including critical states and dangerous states that caused failure; and the second one is the experience pool of the secure sample, which excluded critical states and dangerous states. The DDN-SDRL algorithm takes advantage of an additional deep Q network to train dangerous samples and reconstructs a new objective function by introducing a penalty component. The new objective function is calculated by the penalty component and the original network objective function. The penalty component, which is trained by a deep Q network with samples in the critical state experience pool, is used to represent critical states before failure. As the DDN-SDRL algorithm fully uses information of critical state, dangerous state and secure state, the agent is able to improve security by avoiding most dangerous states during the training process. The DDN-SDRL is a general mechanism of enhancing security during the learning and can be combined with a variety of deep network models, such as DQN, dueling deep Q network(DuDQN), and deep recurrent Q network(DRQN). In the simulated experiments, DQN, DuDQN and DRQN were used as original deep network respectively, and at the same time DDN-SDRL was applied to ensure security. The results of six testing Atari 2600 games, CrazyClimber, Kangaroo, KungFuMaster, Pooyan, RoadRunner and Zaxxon, indicate that the proposed DDN-SDRL algorithm makes control safer, more stable and more effective. It can be concluded that the characteristics of the environment suitable for DDN-SDRL include: (1) there are many representable dangerous states that lead to failure in the environment; (2) the difference between dangerous states and secure states is discriminative; (3) there are not too many actions and the agent can attain improvement by self-training. In these cases, the DDN-SDRL improves original deep network much better. © 2019, Science Press. All right reserved.