SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引：3

作者：

Zhang, Hongcheng ^{[1
]}

Zhao, Xu ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

video understanding; video action detection; spatio-temporal action detection; anchor-free detector;

D O I：

10.1109/ICASSP43922.2022.9746817

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.

引用

页码：2180 / 2184

页数：5

共 50 条

[1] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
Lu, Xuemin
Quan, Wei
Marek, Reformat
Zhao, Haiquan
Chen, Jim X. X.
VISUAL COMPUTER, 2024, 40 (05): : 3163 - 3181
[2] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
Xuemin Lu
Wei Quan
Reformat Marek
Haiquan Zhao
Jim X. Chen
The Visual Computer, 2024, 40 : 3163 - 3181
[3] Spatio-Temporal Action Detection Under Large Motion
Singh, Gurkirt
Choutas, Vasileios
Saha, Suman
Yu, Fisher
Van Gool, Luc
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5998 - 6007
[4] Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation
Kim, Jaekyum
Koh, Junho
Lee, Byeongwon
Yang, Seungji
Choi, Jun Won
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1604 - 1610
[5] ENHANCED ACTION TUBELET DETECTOR FOR SPATIO-TEMPORAL VIDEO ACTION DETECTION
Wu, Yutang
Wang, Hanli
Wang, Shuheng
Li, Qinyu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2388 - 2392
[6] YOWOv3: A Lightweight Spatio-Temporal Joint Network for Video Action Detection
Zhu, Anlei
Wang, Yinghui
Yang, Jinlong
Yan, Tao
Ma, Haomiao
Li, Wei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8148 - 8160
[7] STEP: Spatio-Temporal Progressive Learning for Video Action Detection
Yang, Xitong
Yang, Xiaodong
Liu, Ming-Yu
Xiao, Fanyi
Davis, Larry
Kautz, Jan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 264 - 272
[8] Spatio-temporal prediction and reconstruction network for video anomaly detection
Liu, Ting
Zhang, Chengqing
Niu, Xiaodong
Wang, Liming
PLOS ONE, 2022, 17 (05):
[9] A novel spatio-temporal memory network for video anomaly detection
Li H.
Chen M.
Multimedia Tools and Applications, 2025, 84 (8) : 4603 - 4624
[10] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
Zhu, Hongyuan
Vial, Romain
Lu, Shijian
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5814 - 5822

← 1 2 3 4 5 →