Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep Networks

被引:1
|
作者
Nigam, Nitika [1 ]
Dutta, Tanima [1 ]
机构
[1] Indian Inst Technol BHU Varanasi, Varanasi 221005, Uttar Pradesh, India
关键词
Videos; Feature extraction; Visualization; Spatiotemporal phenomena; Convolution; Tensors; Emotion recognition; Action recognition; deep neural networks (DNNs); long temporal context; Visual Attention with Long-term Context (VALC) dataset; LINK; visual attention;
D O I
10.1109/TCSS.2022.3187198
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emotions and gestures are essential elements in improving social intelligence and predicting real human action. In recent years, recognition of human visual actions using deep neural networks (DNNs) has gained wide popularity in multimedia and computer vision. However, ambiguous action classes, such as "praying" and "pleading", are still challenging to classify due to similar visual cues of action. We need to focus on attentive associated features of facial expressions and gestures, including the long-term context of a video for the correct classification of ambiguous actions. This article proposes an attention-aware DNN named human action attention network (HAANet) that can capture long-term temporal context to recognize actions in videos. The visual attention network extracts discriminative features of facial expressions and gestures in the spatial and temporal dimensions. We have further consolidated a class-specific attention pooling mechanism to capture transition in semantic traits over time. The efficacy of HAANet is demonstrated on five benchmark datasets. As per our knowledge, no publicly available dataset exists in the literature, which distinguishes ambiguous human actions by focusing on the visual cues of a human in action. This motivated us to create a new dataset, known as Visual Attention with Long-term Context (VALC), which contains 32 actions with about 101 videos per class and an average length of 30 s. HAANet outperforms UCF101, ActivityNet, and BreakFast-Actions datasets in terms of accuracy.
引用
收藏
页码:2546 / 2556
页数:11
相关论文
共 50 条
  • [31] Cross-Subject Emotion Recognition Using Deep Adaptation Networks
    Li, He
    Jin, Yi-Ming
    Zheng, Wei-Long
    Lu, Bao-Liang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 403 - 413
  • [32] Emotion Recognition from Videos Using Facial Expressions
    Selvi, P. Tamil
    Vyshnavi, P.
    Jagadish, R.
    Srikumar, Shravan
    Veni, S.
    [J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 565 - 576
  • [33] Deep Dynamic Neural Networks for Gesture Segmentation and Recognition
    Wu, Di
    Shao, Ling
    [J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 552 - 571
  • [34] Emotion Recognition in the Wild from Videos using Images
    Bargal, Sarah Adel
    Barsoum, Emad
    Ferrer, Cristian Canton
    Zhang, Cha
    [J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
  • [35] AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
    Kar, Amlan
    Rai, Nishant
    Sikka, Karan
    Sharma, Gaurav
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5699 - 5708
  • [36] Dynamic Sampling Networks for Efficient Action Recognition in Videos
    Zheng, Yin-Dong
    Liu, Zhaoyang
    Lu, Tong
    Wang, Limin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7970 - 7983
  • [37] Robustness of Deep LSTM Networks in Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 330 - 343
  • [38] Video-based Emotion Recognition Using Deeply-Supervised Neural Networks
    Fan, Yingruo
    Lam, Jacqueline C. K.
    Li, Victor O. K.
    [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 584 - 588
  • [39] Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning
    Ho, Che-Ting
    Lin, Yu-Hsun
    Wu, Ja-Ling
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 3 - 12
  • [40] A Novel Supervised Bimodal Emotion Recognition Approach Based on Facial Expression and Body Gesture
    Yan, Jingjie
    Lu, Guanming
    Bai, Xiaodong
    Li, Haibo
    Sun, Ning
    Liang, Ruiyu
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2018, E101A (11) : 2003 - 2006