Dilated Multi-Temporal Modeling for Action Recognition

被引：0

作者：

Zhang, Tao ^{[1
]}

Wu, Yifan ^{[1
]}

Li, Xiaoqiang ^{[1
]}

机构：

[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期

关键词：

computer vision; action recognition; multiple temporal modeling; dilated convolution;

D O I：

10.3390/app13126934

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Action recognition involves capturing temporal information from video clips where the duration varies with videos for the same action. Due to the diverse scale of temporal context, uniform size kernels utilized in convolutional neural networks (CNNs) limit the capability of multiple-scale temporal modeling. In this paper, we propose a novel dilated multi-temporal (DMT) module that provides a solution for modeling multi-temporal information in action recognition. By using dilated convolutions with different dilation rates in different feature map channels, the DMT module captures information at multiple scales without the need for costly multi-branch networks, input-level frame pyramids, or feature map stacking that previous works have usually incurred. Therefore, this approach enables the integration of temporal information from multiple scales. In addition, the DMT module can be integrated into existing 2D CNNs, making it a straightforward and intuitive solution for addressing the challenge of multi-temporal modeling. Our proposed method has demonstrated promising results in performance and has achieved about 2% and 1% accuracy improvement on FineGym99 and SthV1. We conducted an empirical analysis that demonstrates how DMT improves the classification accuracy for action classes with varying durations.

引用

页数：15

共 50 条

[31] A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition
Zhang, Jiaxu
Ye, Gaoxiang
Tu, Zhigang
Qin, Yongtao
Qin, Qianqing
Zhang, Jinlu
Liu, Jun
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (01) : 46 - 55
[32] Modeling for multi-temporal cyanobacterial bloom dominance and distributions using landsat imagery
Isenstein, Elizabeth M.
Kim, Daeyoung
Park, Mi-Hyun
ECOLOGICAL INFORMATICS, 2020, 59
[33] Modeling Multi-Label Action Dependencies for Temporal Action Localization
Tirupattur, Praveen
Duarte, Kevin
Rawat, Yogesh S.
Shah, Mubarak
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1460 - 1470
[34] Rice Recognition Using Multi-temporal and Dual Polarized Synthetic Aperture Radar Images
Chen, Henglin
Li, Huiguo
2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 96 - 100
[35] Multi-temporal Anomaly Detection Technique
Dayan, I
Maman, S.
Blumberg, D. G.
Rotman, S.
ELECTRO-OPTICAL AND INFRARED SYSTEMS: TECHNOLOGY AND APPLICATIONS XIII, 2016, 9987
[36] Predictive mining of multi-temporal relations
Amico, Beatrice
Combi, Carlo
Rizzi, Romeo
Sala, Pietro
INFORMATION AND COMPUTATION, 2024, 301
[37] Long-Short Temporal Modeling for Efficient Action Recognition
Wu, Liyu
Zou, Yuexian
Zhang, Can
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, 2021-June : 2435 - 2439
[38] MULTI-TEMPORAL SPATIAL DATA AFRICA
Becker, R.
37TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT, 2017, 42-3 (W2): : 27 - 29
[39] THE INITIATION OF COERCION - A MULTI-TEMPORAL ANALYSIS
MCDOUGAL, MS
FELICIANO, FP
AMERICAN JOURNAL OF INTERNATIONAL LAW, 1958, 52 (02) : 241 - 259
[40] LONG-SHORT TEMPORAL MODELING FOR EFFICIENT ACTION RECOGNITION
Wu, Liyu
Zou, Yuexian
Zhang, Can
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2435 - 2439

← 1 2 3 4 5 →