A multi-scale multi-attention network for dynamic facial expression recognition

被引：0

作者：

Xia, Xiaohan ^{[1
]}

Yang, Le ^{[1
]}

Wei, Xiaoyong ^{[2
,3
]}

Sahli, Hichem ^{[4
,5
]}

Jiang, Dongmei ^{[1
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Natl Engn Lab Integrated Aerosp Ground Ocean Big, Shaanxi Key Lab Speech & Image Informat Proc, Youyi Xilu 127, Xian 710072, Peoples R China

[2] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China

[3] Peng Cheng Lab, Vanke Cloud City Phase 1,Bldg 8,Xili St, Shenzhen 518055, Guangdong, Peoples R China

[4] Vrije Univ Brussel VUB, Dept Elect & Informat ETRO, Pl Laan 2, B-1050 Brussels, Belgium

[5] Interunivers Microelect Ctr IMEC, Kapeldreef 75, B-3001 Heverlee, Belgium

来源：

MULTIMEDIA SYSTEMS | 2022年 / 28卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Facial expression recognition; Multi-scale multi-attention network (MSMA-Net); Spatial attention; Temporal attention; MODEL;

D O I：

10.1007/s00530-021-00849-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Characterizing spatial information and modelling temporal dynamics of facial images are key challenges for dynamic facial expression recognition (FER). In this paper, we propose an end-to-end multi-scale multi-attention network (MSMA-Net) for dynamic FER. In our model, the spatio-temporal features are encoded at two scales, i.e. the entire face and local facial patches. For each scale, we adopt a 2D convolutional neural network (CNN) to capture frame-based spatial information, and a 3D CNN to depict the short-term dynamics in the temporal sequence. Moreover, we propose a multi-attention mechanism by considering both spatial and temporal attention models. The temporal attention is applied on the image sequence to highlight expressive frames within the whole sequence, and the spatial attention mechanism is applied at the patch level to learn salient facial features. Comprehensive experiments on publicly available datasets (Aff-Wild2, RML, and AFEW) show that the proposed MSMA-Net model automatically highlights salient expressive frames, within which salient facial features are learned, allowing better or very competitive results compared to state-of-the-art methods.

引用

页码：479 / 493

页数：15

共 50 条

[21] A Cascade Attention Based Facial Expression Recognition Network by Fusing Multi-Scale Spatio-Temporal Features
Zhu, Xiaoliang
He, Zili
Zhao, Liang
Dai, Zhicheng
Yang, Qiaolai
[J]. SENSORS, 2022, 22 (04)
[22] Feature fusion of multi-granularity and multi-scale for facial expression recognition
Xia, Haiying
Lu, Lidan
Song, Shuxiang
[J]. VISUAL COMPUTER, 2024, 40 (03): : 2035 - 2047
[23] Feature fusion of multi-granularity and multi-scale for facial expression recognition
Haiying Xia
Lidan Lu
Shuxiang Song
[J]. The Visual Computer, 2024, 40 : 2035 - 2047
[24] Joint spatial and scale attention network for multi-view facial expression recognition
Liu, Yuanyuan
Peng, Jiyao
Dai, Wei
Zeng, Jiabei
Shan, Shiguang
[J]. PATTERN RECOGNITION, 2023, 139
[25] A novel multi-attention, multi-scale 3D deep network for coronary artery segmentation
Dong, Caixia
Xu, Songhua
Dai, Duwei
Zhang, Yizhi
Zhang, Chunyan
Li, Zongfang
[J]. MEDICAL IMAGE ANALYSIS, 2023, 85
[26] Facial Expression Recognition Based on Multi-scale Vector Triangle
Jiang, He
Hu, Min
Chen, Hongbo
Li, Kun
Wang, Xiaohua
Ren, Fuji
[J]. 2013 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2013, : 82 - 87
[27] Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild
Zhao, Zengqun
Liu, Qingshan
Wang, Shanmin
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6544 - 6556
[28] Multi-scale convolutional attention network for radar behavior recognition
Xiong, Jingwei
Pan, Jifei
Bi, Duping
Du, Mingyang
[J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2023, 50 (06): : 62 - 74
[29] Multimodal high-grade glioma semantic segmentation network with multi-scale and multi-attention fusion mechanism
Wu, Yuchao
Lin, Lan
Wu, Shuicai
[J]. Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2022, 39 (03): : 433 - 440
[30] Remote Sensing Image Change Detection Based on Deep Multi-Scale Multi-Attention Siamese Transformer Network
Zhang, Mengxuan
Liu, Zhao
Feng, Jie
Liu, Long
Jiao, Licheng
[J]. REMOTE SENSING, 2023, 15 (03)

← 1 2 3 4 5 →