Three-stream spatio-temporal attention network for first-person action and interaction recognition

被引:5
|
作者
Imran, Javed [1 ]
Raman, Balasubramanian [2 ]
机构
[1] Univ Petr & Energy Studies, Dept Informat, Sch Comp Sci, Dehra Dun, Uttarakhand, India
[2] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, Uttar Pradesh, India
关键词
First-person action recognition; 3D convolutional neural network; Recurrent neural network; Feature fusion; Soft attention;
D O I
10.1007/s12652-021-02940-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of action and interaction recognition of human activities from the perspective of first-person view-point is an interesting area of research in the field of human action recognition (HAR). This paper presents a data-driven spatio-temporal network to combine different modalities computed from first-person videos using a temporal attention mechanism. First, our proposed approach uses three-stream inflated 3D ConvNet (I3D) to extract low-level features from RGB frame difference (FD), optical flow (OF) and magnitude-orientation (MO) streams. An I3D network has the advantage to directly learn spatio-temporal features over short video snippets (like 16 frames). Second, the extracted features are fused together and fed to a Bidirectional long short-term memory (BiLSTM) network to model high-level temporal feature sequences. Third, we propose to incorporate attention mechanism with our BiLSTM network to automatically select the most relevant temporal snippets in the given video sequence. Finally, we conducted extensive experiments and achieve state-of-the-art results on JPL (98.5%), NUS (84.1%), UTK (91.5%) and DogCentric (83.3%) datasets. These results show that features extracted from three-stream network are complementary to each other, and attention mechanism further improves the results by a large margin than previous attempts based on handcrafted and deep features.
引用
收藏
页码:1137 / 1152
页数:16
相关论文
共 50 条
  • [1] Three-stream spatio-temporal attention network for first-person action and interaction recognition
    Javed Imran
    Balasubramanian Raman
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 1137 - 1152
  • [2] Three-stream fusion network for first-person interaction recognition
    Kim, Ye-Ji
    Lee, Dong-Gyu
    Lee, Seong-Whan
    [J]. PATTERN RECOGNITION, 2020, 103
  • [3] First-Person Activity Recognition Based on Three-Stream Deep Features
    Kim, Ye-Ji
    Lee, Dong-Gyu
    Lee, Seong-Whan
    [J]. 2018 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2018, : 297 - 299
  • [4] Traffic Accident Recognition in First-Person Videos by Learning a Spatio-Temporal Visual Pattern
    Park, Kyung Ho
    Ahn, Dong Hyun
    Kim, Huy Kang
    [J]. 2021 IEEE 93RD VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-SPRING), 2021,
  • [5] Two-person interaction recognition based on multi-stream spatio-temporal fusion network
    Pei, Xiaomin
    Fan, Huijie
    Tang, Yandong
    [J]. Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2020, 49 (05):
  • [6] Action and Interaction Recognition in First-person videos
    Narayan, Sanath
    Kankanhalli, Mohan S.
    Ramakrishnan, Kalpathi R.
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 526 - +
  • [7] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    [J]. ENTROPY, 2022, 24 (03)
  • [8] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
    Kim, Myeongjun
    Kim, Taehun
    Kim, Daijin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
  • [9] Dual Stream Spatio-Temporal Motion Fusion With Self-Attention For Action Recognition
    Jalal, Md Asif
    Aftab, Waqas
    Moore, Roger K.
    Mihaylova, Lyudmila
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [10] Multi-Modal Three-Stream Network for Action Recognition
    Khalid, Muhammad Usman
    Yu, Jie
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3210 - 3215