As an essential component of human-computer interaction, affective computing has garnered extensive attention from the academic community. Identifying electroencephalogram (EEG) features with stronger time-space-frequency correlations and developing efficient and lightweight emotion recognition models have been key focuses in this field. This paper designs an emotion recognition framework and optimizes a deep learning model-Efficient Capsule Network with Convolutional Attention (ECNCA). Firstly, by exploring the temporal, frequency, and spatial features in EEG data, we concatenate and fuse the theta, alpha, beta, and gamma frequency bands to fully utilize the information in EEG data for emotion classification. Secondly, ECNCA enhances input data through Convolutional Neural Networks (CNN) and attention mechanisms and employs Efficient-Capsule to classify emotions, achieving the goal of high accuracy with low computational cost. Finally, we conducted various experiments on the SEED and DEAP datasets, achieving average accuracies of 95.26 % f 0.89 % and 92.12 % f 1.38 % for the three-class and four-class emotion classification tasks, respectively. After calibration, the model achieved average accuracies of 94.67 % f 1.78 % and 91.39 % f 1.99 %. Additionally, experiments demonstrated that ECNCA has advantages in computational cost. The results indicate that the proposed emotion recognition framework can effectively classify emotions in complex environments based on different emotion datasets, providing significant reference value for practical applications in affective computing.