Attention-based network for effective action recognition from multi-view video

被引:2
|
作者
Hoang-Thuyen Nguyen [1 ]
Thi-Oanh Nguyen [1 ]
机构
[1] Hanoi Univ Sci & Technol HUST, SoICT, 1 Dai Co Viet St, Hanoi, Vietnam
关键词
human action recognition; cross-view attention; multi-view; cross-view; multi-camera; multi-branch; FOCUS;
D O I
10.1016/j.procs.2021.08.100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A human action recognition system is affected by many challenges such as background clutter, partial occlusion, lighting, viewpoint, execution rate. Using complementary information from different views can improve view changing and occlusion problems. However, how to effectively integrate the information from multi-view images? In this paper, we propose an effective approach for multi-view human action recognition. The proposition is based on attention mechanism to pass discriminate feature between views. It is designed to form a multi-branch network whose each branch takes responsibility for extracting a view-specific feature. Furthermore, we built a cross-view attention module to enhance action recognition by transferring knowledge between views (branches). Experiments on three datasets show that the proposed solution works effectively in different scenarios. Our models have achieved the best results on two datasets (NUMA and MicaHandGesture) for both cross-subject and cross-view evaluations. On the NUMA dataset, the accuracy of our best models reach to 99.56% and 92.74% in cross-subject and cross-view evaluation scenarios respectively. And on the MicaHandGesture dataset, the accuracy are 99.06%, 91.71% in two scenarios respectively. The obtained results surpass other previous works such as Multi-Branch TSN with GRU [5] (93.81% in cross-subject evaluation, 84.4% in cross-view evaluation on the NUMA) and DA-Net [31] (92.1% for cross-subject evaluation (video-level), and 84.2% for cross-view evaluation on the NUMA dataset). We also obtained very promising results on a large-scale NTU RGB+D dataset. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:971 / 980
页数:10
相关论文
共 50 条
  • [1] Action Recognition with a Multi-View Temporal Attention Network
    Dengdi Sun
    Zhixiang Su
    Zhuanlian Ding
    Bin Luo
    [J]. Cognitive Computation, 2022, 14 : 1082 - 1095
  • [2] Action Recognition with a Multi-View Temporal Attention Network
    Sun, Dengdi
    Su, Zhixiang
    Ding, Zhuanlian
    Luo, Bin
    [J]. COGNITIVE COMPUTATION, 2022, 14 (03) : 1082 - 1095
  • [3] An Attention-based Collaboration Framework for Multi-View Network Representation Learning
    Qu, Meng
    Tang, Jian
    Shang, Jingbo
    Ren, Xiang
    Zhang, Ming
    Han, Jiawei
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1767 - 1776
  • [4] Silhouette-Based Multi-View Human Action Recognition in Video
    Aryanfar, Alihossein
    Yaakob, Razali
    Halin, Alfian Abdul
    Sulaiman, Md Nasir
    Kasmiran, Khairul Azhar
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST), 2014,
  • [5] View knowledge transfer network for multi-view action recognition
    Liang, Zixi
    Yin, Ming
    Gao, Junli
    He, Yicheng
    Huang, Weitian
    [J]. IMAGE AND VISION COMPUTING, 2022, 118
  • [6] Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition
    Liu, Xueping
    Li, Yibo
    Wang, Qingjun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)
  • [7] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    [J]. IEEE Access, 2024, 12 : 137203 - 137221
  • [8] Orthogonal channel attention-based multi-task learning for multi-view facial expression recognition
    Chen, Jingying
    Yang, Lei
    Tan, Lei
    Xu, Ruyi
    [J]. PATTERN RECOGNITION, 2022, 129
  • [9] Dividing and Aggregating Network for Multi-view Action Recognition
    Wang, Dongang
    Ouyang, Wanli
    Li, Wen
    Xu, Dong
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 457 - 473
  • [10] Attention-based Deep Reinforcement Learning for Multi-view Environments
    Barati, Elaheh
    Chen, Xuewen
    Zhong, Zichun
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1805 - 1807