Three-stream spatio-temporal attention network for first-person action and interaction recognition

被引:0
|
作者
Javed Imran
Balasubramanian Raman
机构
[1] University of Petroleum and Energy Studies,Department of Informatics, School of Computer Science
[2] Indian Institute of Technology Roorkee,Department of Computer Science and Engineering
关键词
First-person action recognition; 3D convolutional neural network; Recurrent neural network; Feature fusion; Soft attention;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of action and interaction recognition of human activities from the perspective of first-person view-point is an interesting area of research in the field of human action recognition (HAR). This paper presents a data-driven spatio-temporal network to combine different modalities computed from first-person videos using a temporal attention mechanism. First, our proposed approach uses three-stream inflated 3D ConvNet (I3D) to extract low-level features from RGB frame difference (FD), optical flow (OF) and magnitude-orientation (MO) streams. An I3D network has the advantage to directly learn spatio-temporal features over short video snippets (like 16 frames). Second, the extracted features are fused together and fed to a Bidirectional long short-term memory (BiLSTM) network to model high-level temporal feature sequences. Third, we propose to incorporate attention mechanism with our BiLSTM network to automatically select the most relevant temporal snippets in the given video sequence. Finally, we conducted extensive experiments and achieve state-of-the-art results on JPL (98.5%), NUS (84.1%), UTK (91.5%) and DogCentric (83.3%) datasets. These results show that features extracted from three-stream network are complementary to each other, and attention mechanism further improves the results by a large margin than previous attempts based on handcrafted and deep features.
引用
收藏
页码:1137 / 1152
页数:15
相关论文
共 50 条
  • [31] Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network
    Li, Shuqin
    Zheng, Xiangwei
    Zhang, Xia
    Chen, Xuanchi
    Li, Wei
    [J]. COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 516 - 532
  • [32] Action Recognition Based on Person-Object Relationship Spatio-Temporal Graph
    Wang, Tianxiao
    Liu, Jun
    [J]. PROCEEDINGS OF 2022 THE 6TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, ICMLSC 20222, 2022, : 105 - 110
  • [33] Skeleton-based human action recognition by fusing attention based three-stream convolutional neural network and SVM
    Ren, Fang
    Tang, Chao
    Tong, Anyang
    Wang, Wenjian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 6273 - 6295
  • [34] TWO-STREAM ATTENTION SPATIO-TEMPORAL NETWORK FOR CLASSIFICATION OF ECHOCARDIOGRAPHY VIDEOS
    Feng, Zishun
    Sivak, Joseph A.
    Krishnamurthy, Ashok K.
    [J]. 2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 1461 - 1465
  • [35] Online action proposal generation using spatio-temporal attention network
    Keisham, Kanchan
    Jalali, Amin
    Lee, Minho
    [J]. NEURAL NETWORKS, 2022, 153 : 518 - 529
  • [36] 3 s-STNet: three-stream spatial–temporal network with appearance and skeleton information learning for action recognition
    Ming Fang
    Siyu Peng
    Yang Zhao
    Haibo Yuan
    Chih-Cheng Hung
    Shuhua Liu
    [J]. Neural Computing and Applications, 2023, 35 : 1835 - 1848
  • [37] On Spatio-Temporal Modelling of Stream Network Initiation
    Papageorgaki I.
    Nalbantis I.
    [J]. Environmental Processes, 2018, 5 (Suppl 1) : 239 - 257
  • [38] Spatio-temporal deformable 3D ConvNets with attention for action recognition
    Li, Jun
    Liu, Xianglong
    Zhang, Mingyuan
    Wang, Deqing
    [J]. PATTERN RECOGNITION, 2020, 98
  • [39] Combining first-person and third-person gaze for attention recognition
    Martinez, Francis
    Carbone, Andrea
    Pissaloux, Edwige
    [J]. 2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [40] Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model
    Pan Na
    Jiang Min
    Kong Jun
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (18)