Large neural networks have shown high performance in various applications, however, they are not suitable for small devices such as smartphones. Therefore, there are needs to realize a small network that is easy to deploy in small devices and has high performance. One of the methods to solve this problem is distillation, which can be used to obtain a small neural network with high performance by transferring knowledge from a large, high-performance teacher model. However, if there is a large difference in the number of parameters between the teacher model and the student model, distillation may not work well. In this paper, we use the Teacher Assistant (TA) model, which is intermediate in the number of layers between the teacher model and the student model, to perform multi-step compression both the hidden and output layers which is a technique known as Hint-Based Training. First, we optimize the TA model by using the teacher model and performing distillation focusing on the hidden and output layers. Then, using the TA model as a teacher, we perform the same distillation of the hidden and output layers on the student model. In this way, we improve the performance of the student model by reducing the size of the model while increasing the depth of the layers step by step. Experiments show that the proposed method can compress the simple CNN model to a size with parameters of about 1/7 compared to the original neural network while maintaining the same classification accuracy for the test dataset. In the student model using ResNet with the bottleneck architecture, the proposed method outperformed the teacher model, which was about 8 times larger in parameter numbers. In addition, the proposed method achieved the best performance for the student model when compared with the existing studies.