Fine-grained action recognition using multi-view attentions

被引:0
|
作者
Yisheng Zhu
Guangcan Liu
机构
[1] Nanjing University of Information Science and Technology,
来源
The Visual Computer | 2020年 / 36卷
关键词
Multi-view attention; Action recognition; Deep neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
Inflated 3D ConvNet (I3D) utilizes 3D convolution to enrich semantic information of features, forming a strong baseline for human action recognition. However, 3D convolution extracts features by mixing spatial, temporal and cross-channel information together, lacking the ability to emphasize meaningful features along specific dimensions, especially for the cross-channel information, which is, however, of crucial importance in recognizing fine-grained actions. In this paper, we propose a novel multi-view attention mechanism, named channel–spatial–temporal attention (CSTA) block, to guide the network to pay more attention to the clues useful for fine-grained action recognition. Specifically, CSTA consists of three branches: channel–spatial branch, channel–temporal branch and spatial–temporal branch. By directly plugging these branches into I3D, we further explore the impact of location information as well as the number of blocks in terms of recognition accuracy. We also examine two different strategies for designing a mixture of multiple CSTA blocks. Extensive experiments demonstrate the effectiveness of our CSTA. Namely, while using only RGB frames to train the network, I3D equipped with CSTA (I3D–CSTA) achieves accuracies of 95.76% and 73.97% on UCF101 and HMDB51, respectively. These results are indeed comparable with the results produced by the methods using both RGB frames and optical flow. Even more, with the assistance of optical flow, the recognition accuracies of CSTA–I3D rise to 98.2% on UCF101 and 82.9% on HMDB51, outperforming many state-of-the-art methods.
引用
收藏
页码:1771 / 1781
页数:10
相关论文
共 50 条
  • [11] Fine-grained maize tassel trait characterization with multi-view representations
    Lu, Hao
    Cao, Zhiguo
    Xiao, Yang
    Fang, Zhiwen
    Zhu, Yanjun
    Xian, Ke
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2015, 118 : 143 - 158
  • [12] Fine-grained similarity fusion for Multi-view Spectral Clustering q
    Yu, Xiao
    Liu, Hui
    Wu, Yan
    Zhang, Caiming
    INFORMATION SCIENCES, 2021, 568 : 350 - 368
  • [13] Fine-grained action recognition using dynamic kernels
    Yenduri, Sravani
    Perveen, Nazil
    Chalavadi, Vishnu
    Mohan, Krishna C.
    PATTERN RECOGNITION, 2022, 122
  • [14] Fine-grained Action Recognition using Attribute Vectors
    Yenduri, Sravani
    Perveen, Nazil
    Chalavadi, Vishnu
    Mohan, C. Krishna
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 134 - 143
  • [15] Fine-grained Recognition of 3D Shapes Based on Multi-view Recurrent Neural Network
    Dong, Shuai
    Zou, Kun
    Li, Wensheng
    ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 152 - 156
  • [16] Exploring Multi-Loss Learning for Multi-View Fine-Grained Vehicle Classification
    Silva, Bruno
    Rodolfo Barbosa-Anda, Francisco
    Batista, Jorge
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2022, 105 (02)
  • [17] Fine-Grained Essential Tensor Learning for Robust Multi-View Spectral Clustering
    Peng, Chong
    Kang, Kehan
    Chen, Yongyong
    Kang, Zhao
    Chen, Chenglizhao
    Cheng, Qiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3145 - 3160
  • [18] Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
    Munro, Jonathan
    Damen, Dima
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 119 - 129
  • [19] Multi-Modal Domain Adaptation for Fine-grained Action Recognition
    Munro, Jonathan
    Damen, Dima
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3723 - 3726
  • [20] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587