Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep Networks

被引：1

作者：

Nigam, Nitika ^{[1
]}

Dutta, Tanima ^{[1
]}

机构：

[1] Indian Inst Technol BHU Varanasi, Varanasi 221005, Uttar Pradesh, India

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2023年 / 10卷 / 05期

关键词：

Videos; Feature extraction; Visualization; Spatiotemporal phenomena; Convolution; Tensors; Emotion recognition; Action recognition; deep neural networks (DNNs); long temporal context; Visual Attention with Long-term Context (VALC) dataset; LINK; visual attention;

D O I：

10.1109/TCSS.2022.3187198

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotions and gestures are essential elements in improving social intelligence and predicting real human action. In recent years, recognition of human visual actions using deep neural networks (DNNs) has gained wide popularity in multimedia and computer vision. However, ambiguous action classes, such as "praying" and "pleading", are still challenging to classify due to similar visual cues of action. We need to focus on attentive associated features of facial expressions and gestures, including the long-term context of a video for the correct classification of ambiguous actions. This article proposes an attention-aware DNN named human action attention network (HAANet) that can capture long-term temporal context to recognize actions in videos. The visual attention network extracts discriminative features of facial expressions and gestures in the spatial and temporal dimensions. We have further consolidated a class-specific attention pooling mechanism to capture transition in semantic traits over time. The efficacy of HAANet is demonstrated on five benchmark datasets. As per our knowledge, no publicly available dataset exists in the literature, which distinguishes ambiguous human actions by focusing on the visual cues of a human in action. This motivated us to create a new dataset, known as Visual Attention with Long-term Context (VALC), which contains 32 actions with about 101 videos per class and an average length of 30 s. HAANet outperforms UCF101, ActivityNet, and BreakFast-Actions datasets in terms of accuracy.

引用

页码：2546 / 2556

页数：11

共 50 条

[31] Cross-Subject Emotion Recognition Using Deep Adaptation Networks
Li, He
Jin, Yi-Ming
Zheng, Wei-Long
Lu, Bao-Liang
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 403 - 413
[32] Emotion Recognition from Videos Using Facial Expressions
Selvi, P. Tamil
Vyshnavi, P.
Jagadish, R.
Srikumar, Shravan
Veni, S.
[J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 565 - 576
[33] Deep Dynamic Neural Networks for Gesture Segmentation and Recognition
Wu, Di
Shao, Ling
[J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 552 - 571
[34] Emotion Recognition in the Wild from Videos using Images
Bargal, Sarah Adel
Barsoum, Emad
Ferrer, Cristian Canton
Zhang, Cha
[J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
[35] AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
Kar, Amlan
Rai, Nishant
Sikka, Karan
Sharma, Gaurav
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5699 - 5708
[36] Dynamic Sampling Networks for Efficient Action Recognition in Videos
Zheng, Yin-Dong
Liu, Zhaoyang
Lu, Tong
Wang, Limin
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7970 - 7983
[37] Robustness of Deep LSTM Networks in Freehand Gesture Recognition
Schak, Monika
Gepperth, Alexander
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 330 - 343
[38] Video-based Emotion Recognition Using Deeply-Supervised Neural Networks
Fan, Yingruo
Lam, Jacqueline C. K.
Li, Victor O. K.
[J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 584 - 588
[39] Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning
Ho, Che-Ting
Lin, Yu-Hsun
Wu, Ja-Ling
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 3 - 12
[40] A Novel Supervised Bimodal Emotion Recognition Approach Based on Facial Expression and Body Gesture
Yan, Jingjie
Lu, Guanming
Bai, Xiaodong
Li, Haibo
Sun, Ning
Liang, Ruiyu
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2018, E101A (11) : 2003 - 2006

← 1 2 3 4 5 →