Dilated Multi-Temporal Modeling for Action Recognition

被引:0
|
作者
Zhang, Tao [1 ]
Wu, Yifan [1 ]
Li, Xiaoqiang [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
computer vision; action recognition; multiple temporal modeling; dilated convolution;
D O I
10.3390/app13126934
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Action recognition involves capturing temporal information from video clips where the duration varies with videos for the same action. Due to the diverse scale of temporal context, uniform size kernels utilized in convolutional neural networks (CNNs) limit the capability of multiple-scale temporal modeling. In this paper, we propose a novel dilated multi-temporal (DMT) module that provides a solution for modeling multi-temporal information in action recognition. By using dilated convolutions with different dilation rates in different feature map channels, the DMT module captures information at multiple scales without the need for costly multi-branch networks, input-level frame pyramids, or feature map stacking that previous works have usually incurred. Therefore, this approach enables the integration of temporal information from multiple scales. In addition, the DMT module can be integrated into existing 2D CNNs, making it a straightforward and intuitive solution for addressing the challenge of multi-temporal modeling. Our proposed method has demonstrated promising results in performance and has achieved about 2% and 1% accuracy improvement on FineGym99 and SthV1. We conducted an empirical analysis that demonstrates how DMT improves the classification accuracy for action classes with varying durations.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition
    Zhang, Jiaxu
    Ye, Gaoxiang
    Tu, Zhigang
    Qin, Yongtao
    Qin, Qianqing
    Zhang, Jinlu
    Liu, Jun
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (01) : 46 - 55
  • [32] Modeling for multi-temporal cyanobacterial bloom dominance and distributions using landsat imagery
    Isenstein, Elizabeth M.
    Kim, Daeyoung
    Park, Mi-Hyun
    ECOLOGICAL INFORMATICS, 2020, 59
  • [33] Modeling Multi-Label Action Dependencies for Temporal Action Localization
    Tirupattur, Praveen
    Duarte, Kevin
    Rawat, Yogesh S.
    Shah, Mubarak
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1460 - 1470
  • [34] Rice Recognition Using Multi-temporal and Dual Polarized Synthetic Aperture Radar Images
    Chen, Henglin
    Li, Huiguo
    2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 96 - 100
  • [35] Multi-temporal Anomaly Detection Technique
    Dayan, I
    Maman, S.
    Blumberg, D. G.
    Rotman, S.
    ELECTRO-OPTICAL AND INFRARED SYSTEMS: TECHNOLOGY AND APPLICATIONS XIII, 2016, 9987
  • [36] Predictive mining of multi-temporal relations
    Amico, Beatrice
    Combi, Carlo
    Rizzi, Romeo
    Sala, Pietro
    INFORMATION AND COMPUTATION, 2024, 301
  • [37] Long-Short Temporal Modeling for Efficient Action Recognition
    Wu, Liyu
    Zou, Yuexian
    Zhang, Can
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, 2021-June : 2435 - 2439
  • [38] MULTI-TEMPORAL SPATIAL DATA AFRICA
    Becker, R.
    37TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT, 2017, 42-3 (W2): : 27 - 29
  • [39] THE INITIATION OF COERCION - A MULTI-TEMPORAL ANALYSIS
    MCDOUGAL, MS
    FELICIANO, FP
    AMERICAN JOURNAL OF INTERNATIONAL LAW, 1958, 52 (02) : 241 - 259
  • [40] LONG-SHORT TEMPORAL MODELING FOR EFFICIENT ACTION RECOGNITION
    Wu, Liyu
    Zou, Yuexian
    Zhang, Can
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2435 - 2439