Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep Networks

被引:1
|
作者
Nigam, Nitika [1 ]
Dutta, Tanima [1 ]
机构
[1] Indian Inst Technol BHU Varanasi, Varanasi 221005, Uttar Pradesh, India
关键词
Videos; Feature extraction; Visualization; Spatiotemporal phenomena; Convolution; Tensors; Emotion recognition; Action recognition; deep neural networks (DNNs); long temporal context; Visual Attention with Long-term Context (VALC) dataset; LINK; visual attention;
D O I
10.1109/TCSS.2022.3187198
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emotions and gestures are essential elements in improving social intelligence and predicting real human action. In recent years, recognition of human visual actions using deep neural networks (DNNs) has gained wide popularity in multimedia and computer vision. However, ambiguous action classes, such as "praying" and "pleading", are still challenging to classify due to similar visual cues of action. We need to focus on attentive associated features of facial expressions and gestures, including the long-term context of a video for the correct classification of ambiguous actions. This article proposes an attention-aware DNN named human action attention network (HAANet) that can capture long-term temporal context to recognize actions in videos. The visual attention network extracts discriminative features of facial expressions and gestures in the spatial and temporal dimensions. We have further consolidated a class-specific attention pooling mechanism to capture transition in semantic traits over time. The efficacy of HAANet is demonstrated on five benchmark datasets. As per our knowledge, no publicly available dataset exists in the literature, which distinguishes ambiguous human actions by focusing on the visual cues of a human in action. This motivated us to create a new dataset, known as Visual Attention with Long-term Context (VALC), which contains 32 actions with about 101 videos per class and an average length of 30 s. HAANet outperforms UCF101, ActivityNet, and BreakFast-Actions datasets in terms of accuracy.
引用
收藏
页码:2546 / 2556
页数:11
相关论文
共 50 条
  • [1] LEARNING DEEP TRAJECTORY DESCRIPTOR FOR ACTION RECOGNITION IN VIDEOS USING DEEP NEURAL NETWORKS
    Shi, Yemin
    Zeng, Wei
    Huang, Tiejun
    Wang, Yaowei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
  • [2] Procedural Generation of Videos to Train Deep Action Recognition Networks
    Roberto de Souza, Cesar
    Gaidon, Adrien
    Cabon, Yohann
    Manuel Lopez, Antonio
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2594 - 2604
  • [3] Integrating Facial Expression and Body Gesture in Videos for Emotion Recognition
    Yan, Jingjie
    Zheng, Wenming
    Xin, Minhai
    Yan, Jingwei
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (03): : 610 - 613
  • [4] Multimodal Emotion Recognition Using Deep Networks
    Fadil, C.
    Alvarez, R.
    Martinez, C.
    Goddard, J.
    Rufiner, H.
    [J]. VI LATIN AMERICAN CONGRESS ON BIOMEDICAL ENGINEERING (CLAIB 2014), 2014, 49 : 813 - 816
  • [5] Human action recognition in videos with articulated pose information by deep networks
    M. Farrajota
    João M. F. Rodrigues
    J. M. H. du Buf
    [J]. Pattern Analysis and Applications, 2019, 22 : 1307 - 1318
  • [6] Human action recognition in videos with articulated pose information by deep networks
    Farrajota, M.
    Rodrigues, Joao M. F.
    du Buf, J. M. H.
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (04) : 1307 - 1318
  • [7] Deep ChaosNet for Action Recognition in Videos
    Chen, Huafeng
    Zhang, Maosheng
    Gao, Zhengming
    Zhao, Yunhong
    [J]. COMPLEXITY, 2021, 2021
  • [8] Hand Gesture Recognition Using Deep Convolutional Neural Networks
    Strezoski, Gjorgji
    Stojanovski, Dario
    Dimitrovski, Ivica
    Madjarov, Gjorgji
    [J]. ICT INNOVATIONS 2016: COGNITIVE FUNCTIONS AND NEXT GENERATION ICT SYSTEMS, 2018, 665 : 49 - 58
  • [9] Visual Emotion Recognition Using Deep Neural Networks
    Iliev, Alexander I.
    Mote, Ameya
    [J]. DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2022, 12 : 77 - 88
  • [10] Emotion Recognition Using Pretrained Deep Neural Networks
    Dobes, Marek
    Sabolova, Natalia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2023, 20 (04) : 195 - 204