STAR: Efficient SpatioTemporal Modeling for Action Recognition

被引：2

作者：

Kumar, Abhijeet ^{[1
]}

Abrams, Samuel ^{[1
]}

Kumar, Abhishek ^{[1
]}

Narayanan, Vijaykrishnan ^{[1
]}

机构：

[1] Penn State Univ, EECS Dept, State Coll, PA 16802 USA

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 02期

关键词：

Action recognition; Compressed domain; I-frames; Spatial-temporal 2D convolutional networks; DOMAIN;

D O I：

10.1007/s00034-022-02160-x

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Action recognition in video has gained significant attention over the past several years. While conventional 2D CNNs have found great success in understanding images, they are not as effective in capturing temporal relationships present in video. By contrast, 3D CNNs capture spatiotemporal information well, but they incur a high computational cost, making deployment challenging. In video, key information is typically confined to a small number of frames, though many current approaches require decompressing and processing all frames, which wastes resources. Others work directly on the compressed domain but require multiple input streams to understand the data. In our work, we directly operate on compressed video and extract information solely from intracoded frames (I-frames) avoiding the use of motion vectors and residuals for motion information making this a single-stream network. This reduces processing time and energy consumption, by extension, making this approach more accessible for a wider range of machines and uses. Extensive testing is employed on the UCF101 (Soomro et al. in UCF101: a dataset of 101 human actions classes from videos in the Wild, 2012) and HMDB51 (Kuehne et al., in: Jhuang, Garrote, Poggio, Serre (eds) Proceedings of the international conference on computer vision (ICCV), 2011) datasets to evaluate our framework and show that computational complexity is reduced significantly while achieving competitive accuracy to existing compressed domain efforts, i.e., 92.6% top1 accuracy in UCF-101 and 62.9% in HMDB-51 dataset with 24.3M parameters and 4 GFLOPS and energy savings of over 11 x for the two datasets versus CoViAR (Wu et al. in Compressed video action recognition, 2018).

引用

页码：705 / 723

页数：19

共 50 条

[31] Spatiotemporal Pyramid Network for Video Action Recognition
Wang, Yunbo
Long, Mingsheng
Wang, Jianmin
Yu, Philip S.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2097 - 2106
[32] Spatiotemporal feature enhancement network for action recognition
Huang, Guancheng
Wang, Xiuhui
Li, Xuesheng
Wang, Yaru
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57187 - 57197
[33] A Closer Look at Spatiotemporal Convolutions for Action Recognition
Tran, Du
Wang, Heng
Torresani, Lorenzo
Ray, Jamie
LeCun, Yann
Paluri, Manohar
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6450 - 6459
[34] Spatiotemporal Fusion Networks for Video Action Recognition
Liu, Zheng
Hu, Haifeng
Zhang, Junxuan
NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
[35] Efficient Action Recognition with MoFREAK
Whiten, Chris
Laganiere, Robert
Bilodeau, Guillaume-Alexandre
2013 INTERNATIONAL CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2013, : 319 - 325
[36] Efficient 2D Temporal Modeling Network for Video Action Recognition
Li, Zhilei
Li, Jun
Shi, Zhiping
Jiang, Na
Zhang, Yongkang
Computer Engineering and Applications, 2024, 59 (03) : 127 - 134
[37] Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition
Tu, Zhigang
Li, Hongyan
Zhang, Dejun
Dauwels, Justin
Li, Baoxin
Yuan, Junsong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2799 - 2812
[38] Spatiotemporal Features for Action Recognition and Salient Event Detection
Rapantzikos, Konstantinos
Avrithis, Yannis
Kollias, Stefanos
COGNITIVE COMPUTATION, 2011, 3 (01) : 167 - 184
[39] SpatioTemporal focus for skeleton-based action recognition
Wu, Liyu
Zhang, Can
Zou, Yuexian
PATTERN RECOGNITION, 2023, 136
[40] A spatiotemporal and motion information extraction network for action recognition
Wang, Wei
Wang, Xianmin
Zhou, Mingliang
Wei, Xuekai
Li, Jing
Ren, Xiaojun
Zong, Xuemei
WIRELESS NETWORKS, 2024, 30 (06) : 5389 - 5405

← 1 2 3 4 5 →