With the widespread application of deep learning technology across various fields, its potential value in educational technology, particularly in recognizing student learning emotions, has begun to gain attention. The real-time and accurate identification of learning emotions is crucial for facilitating personalized teaching and enhancing learning efficiency. This paper focuses on the automatic recognition of student learning emotions based on deep learning technology, aiming to improve the accuracy and practicality of recognition by optimizing the preprocessing of facial expression images and the temporal expression recognition model. The research starts with facial detection using Haar -like features and the Adaboost cascade method, followed by normalization of the detected facial images in scale, angle, and grayscale to enhance the system's robustness to facial image variations. Subsequently, a temporal expression recognition model based on a multi -attention fusion network is proposed. This model utilizes both shallow and deep features of deep learning, along with the prior knowledge of Facial Action Coding System (FACS), to capture the dynamic changes in facial expressions more intricately. Finally, by introducing three different attention mechanisms, this study significantly improved the efficiency and accuracy of emotion feature recognition in sequential data. The findings of this paper not only advance the technology of learning emotion recognition but also provide valuable insights for educational practice.