With the rapid development of artificial intelligence, deep neural networks are widely used in various research fields and have achieved great success, but they also face a lot of challenges. First of all, to solve more complex problems and improve the training effect of the model, the network structure of the model is gradually designed to be deep and complex, and it is difficult to adapt to the development of mobile computing for low resources and low power consumption. Knowledge distillation was originally used for model compression as a learning paradigm that transfers knowledge from a large teacher model to a compact student model and improves performance. However, with the development of knowledge distillation, its teacher-student architecture, as a special transfer learning method, has evolved a rich variety of variants and architectures, and has been gradually extended to various deep learning tasks and scenarios, including computers vision, natural language processing, recommendation systems, etc. In addition, through the learning method of transferring knowledge between neural network models, cross-modal or cross-domain learning tasks can be connected to avoid knowledge forgetting; it can also achieve the separation of models and data to achieve the purpose of protecting private data. Knowledge distillation is playing an increasingly important role in various fields of artificial intelligence, and it is a universal means to solve many practical problems. This paper sorts out the main references of knowledge distillation, elaborates the learning framework of knowledge distillation, compares and analyzes the related work of knowledge distillation from a variety of classification perspectives, introduces the main application scenarios, and finally discusses the future development trends and provides insights. © 2022, Science Press. All right reserved.