Class-Balanced and Local Median Loss Jointly Supervised for Wild Facial Expression Recognition

被引:0
|
作者
Shi C. [1 ]
Tian M. [1 ]
机构
[1] Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing
关键词
Convolutional neural networks; Facial expression recognition; Imbalanced dataset; Intra-class variation; Loss function;
D O I
10.3724/SP.J.1089.2020.17984
中图分类号
学科分类号
摘要
Over the past few years, convolutional neural networks have shown effective performance on laboratory-controlled facial expression recognition. However, it is still a challenge problem for facial expression recognition in the wild. In this paper, a loss function - CALM loss (class-balanced and local median) is proposed to solve the problem of imbalanced data for wild facial expression recognition and large intra-class variation caused by posture, illumination and gender. The CALM loss includes two parts: the class-balanced Softmax loss function and the local median loss function. The class-balanced Softmax loss function marks the two expressions of fear and disgust, which have a small amount of data and are prone to misclassification, as difficult samples, and the other five expressions as easy samples. During the network training, the weight of difficult samples is adaptively increased to improve the recognition accuracy of difficult samples, so as to improve the average accuracy of expression recognition. In addition, there are some samples in each category that are far away from the majority of the samples in the category, and their existence will cause the center of the category calculated by the mean method to deviate from the majority of the samples in the category. The local median loss function uses the median value of several neighbors that belong to the same category as each sample as the class center, which can reduce the impact of outlier samples on the choice of category center to a certain extent. The average recognition accuracy on the RAF (real-world affective faces) dataset was improved by 1.32% compared with local subclass method, which proves the effectiveness of the proposed method. © 2020, Beijing China Science Journal Publishing Co. Ltd. All right reserved.
引用
收藏
页码:1484 / 1491
页数:7
相关论文
共 22 条
  • [1] Ekman P E, Friesen W., Pictures of facial affect, (1976)
  • [2] Shan C, Gong S, Mcowan P W., Facial expression recognition based on local binary patterns: a comprehensive study, Image and Vision Computing, 27, 6, pp. 803-816, (2009)
  • [3] Dahmane M, Meunier J., Emotion recognition using dynamic grid-based HoG features, Proceedings of the 9th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 884-888, (2011)
  • [4] Tang H, Hasegawa-Johnson M, Huang T., Non-frontal view facial expression recognition based on ergodic hidden Markov model supervectors, Proceedings of the IEEE International Conference on Multimedia & Expo, pp. 1202-1207, (2010)
  • [5] Krizhevsky A, Sutskever I, Hinton G E., ImageNet classification with deep convolutional neural networks, Proceedings of the International Conference on Neural Information Processing Systems, pp. 1097-1105, (2012)
  • [6] Girshick R, Donahue J, Darrell T, Et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [7] Charles R, Hao S, Mo K, Et al., PiontNet: deep learning on point sets for 3D classification and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 77-85, (2017)
  • [8] Papandreou G, Zhu T, Kanazawa N, Et al., Towards accurate multi-person pose estimation in the wild, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3711-3719, (2017)
  • [9] Kaya H, Gurpinar F, Salah A A., Video-based emotion recognition in the wild using deep transfer learning and score fusion, Image and Vision Computing, 65, pp. 66-75, (2017)
  • [10] Ng H W, Nguyen D, Vonikakis V, Et al., Deep learning for emotion recognition on small datasets using transfer learning, Proceedings of the ACM International Conference on Multimodal Interaction, pp. 443-449, (2015)