SWTA: Sparse Weighted Temporal Attention for Drone-Based Activity Recognition

被引:0
|
作者
Yadav, Santosh Kumar [1 ]
Pahwa, Esha [2 ]
Luthra, Achleshwar [2 ]
Tiwari, Kamlesh [2 ]
Pandey, Hari Mohan [3 ]
机构
[1] Natl Univ Ireland, Coll Sci & Engn, Galway H91 TK33, Ireland
[2] Birla Inst Technol & Sci, Dept CSIS, Pilani 333031, Rajasthan, India
[3] Bournemouth Univ, Comp & Informat, Poole BH12 5BB, Dorset, England
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
关键词
Human Activity Recognition; Video Understanding; Drone Action Recognition; CLASSIFICATION;
D O I
10.1109/IJCNN54540.2023.10191750
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community in the past few years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is divided into two components. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Dual attention based spatial-temporal inference network for volleyball group activity recognition
    Li, Yanshan
    Liu, Yan
    Yu, Rui
    Zong, Hailin
    Xie, Weixin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15515 - 15533
  • [22] Weighted sparse representation for human ear recognition based on local descriptor
    Mawloud, Guermoui
    Djamel, Melaab
    JOURNAL OF ELECTRONIC IMAGING, 2016, 25 (01)
  • [23] Weighted sparse representation based on virtual test samples for face recognition
    Zhu, Ningbo
    Chen, Shuoxuan
    OPTIK, 2017, 140 : 853 - 859
  • [24] Human behavior recognition based on sparse transformer with channel attention mechanism
    Cao, Keyan
    Wang, Mingrui
    FRONTIERS IN PHYSIOLOGY, 2023, 14
  • [25] Gait recognition via weighted global-local feature fusion and attention-based multiscale temporal aggregation
    Xu, Yingqi
    Xi, Hao
    Ren, Kai
    Zhu, Qiyuan
    Hu, Chuanping
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [26] Object Detection in Drone Video with Temporal Attention Gated Recurrent Unit Based on Transformer
    Zhou, Zihao
    Yu, Xianguo
    Chen, Xiangcheng
    DRONES, 2023, 7 (07)
  • [27] Diversity and ice nucleation activity of Pseudomonas syringae in drone-based water samples from eight lakes in Austria
    Hanlon, Regina
    Jimenez-Sanchez, Celia
    Benson, James
    Aho, Ken
    Morris, Cindy
    Seifried, Teresa M.
    Baloh, Philipp
    Grothe, Hinrich
    Schmale, David
    PEERJ, 2023, 11
  • [28] A lightweight temporal attention-based convolution neural network for driver's activity recognition in edge
    Yang, Lichao
    Du, Weixiang
    Zhao, Yifan
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 110
  • [29] Temporal-Frequency Attention-Based Human Activity Recognition Using Commercial WiFi Devices
    Yang, Xiaolong
    Cao, Ruoyu
    Zhou, Mu
    Xie, Liangbo
    IEEE ACCESS, 2020, 8 : 137758 - 137769
  • [30] Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
    Cheng, Shilei
    Xie, Mei
    Ma, Zheng
    Li, Siqi
    Gu, Song
    Yang, Feng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01) : 220 - 224