Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep Networks

被引:1
|
作者
Nigam, Nitika [1 ]
Dutta, Tanima [1 ]
机构
[1] Indian Inst Technol BHU Varanasi, Varanasi 221005, Uttar Pradesh, India
关键词
Videos; Feature extraction; Visualization; Spatiotemporal phenomena; Convolution; Tensors; Emotion recognition; Action recognition; deep neural networks (DNNs); long temporal context; Visual Attention with Long-term Context (VALC) dataset; LINK; visual attention;
D O I
10.1109/TCSS.2022.3187198
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emotions and gestures are essential elements in improving social intelligence and predicting real human action. In recent years, recognition of human visual actions using deep neural networks (DNNs) has gained wide popularity in multimedia and computer vision. However, ambiguous action classes, such as "praying" and "pleading", are still challenging to classify due to similar visual cues of action. We need to focus on attentive associated features of facial expressions and gestures, including the long-term context of a video for the correct classification of ambiguous actions. This article proposes an attention-aware DNN named human action attention network (HAANet) that can capture long-term temporal context to recognize actions in videos. The visual attention network extracts discriminative features of facial expressions and gestures in the spatial and temporal dimensions. We have further consolidated a class-specific attention pooling mechanism to capture transition in semantic traits over time. The efficacy of HAANet is demonstrated on five benchmark datasets. As per our knowledge, no publicly available dataset exists in the literature, which distinguishes ambiguous human actions by focusing on the visual cues of a human in action. This motivated us to create a new dataset, known as Visual Attention with Long-term Context (VALC), which contains 32 actions with about 101 videos per class and an average length of 30 s. HAANet outperforms UCF101, ActivityNet, and BreakFast-Actions datasets in terms of accuracy.
引用
收藏
页码:2546 / 2556
页数:11
相关论文
共 50 条
  • [21] Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos
    Liu, Xiao
    Yang, Xudong
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 251 - 262
  • [22] Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    Niu, Mingyue
    Yi, Jiangyan
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [23] Human Action Recognition Using Deep Neural Networks
    Koli, Rashmi R.
    Bagban, Tanveer, I
    [J]. PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 376 - 380
  • [24] Emotion recognition in talking-face videos using persistent entropy and neural networks
    Paluzo-Hidalgo, Eduardo
    Gonzalez-Diaz, Rocio
    Aguirre-Carrazana, Guillermo
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2022, 30 (02): : 644 - 660
  • [25] Boosting VLAD with Double Assignment using Deep Features for Action Recognition in Videos
    Duta, Ionut C.
    Nguyen, Tuan A.
    Aizawa, Kiyoharu
    Ionescu, Bogdan
    Sebe, Nicu
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2210 - 2215
  • [26] Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos
    Pfister, Tomas
    Simonyan, Karen
    Charles, James
    Zisserman, Andrew
    [J]. COMPUTER VISION - ACCV 2014, PT I, 2015, 9003 : 538 - 552
  • [27] Speech Emotion Recognition using Supervised Deep Recurrent System for Mental Health Monitoring
    Elsayed, Nelly
    ElSayed, Zag
    Asadizanjani, Navid
    Ozer, Murat
    Abdelgawad, Ahmed
    Bayoumi, Magdy
    [J]. 2022 IEEE 8TH WORLD FORUM ON INTERNET OF THINGS, WF-IOT, 2022,
  • [28] Semantic Segmentation based Hand Gesture Recognition using Deep Neural Networks
    Dutta, H. Pallab Jyoti
    Sarma, Debajit
    Bhuyan, M. K.
    Laskar, R. H.
    [J]. 2020 TWENTY SIXTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC 2020), 2020,
  • [29] HUMAN ACTIVITY DETECTION AND ACTION RECOGNITION IN VIDEOS USING CONVOLUTIONAL NEURAL NETWORKS
    Basavaiah, Jagadeesh
    Patil, Chandrashekar Mohan
    [J]. JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2020, 19 (02): : 157 - 183
  • [30] A Bilingual Emotion Recognition System Using Deep Learning Neural Networks
    Absa, Ahmed H. Abo
    Deriche, M.
    Mohandes, M.
    [J]. 2018 15TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES (SSD), 2018, : 1241 - 1245