SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引：3

作者：

Zhang, Hongcheng ^{[1
]}

Zhao, Xu ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

video understanding; video action detection; spatio-temporal action detection; anchor-free detector;

D O I：

10.1109/ICASSP43922.2022.9746817

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.

引用

页码：2180 / 2184

页数：5

共 50 条

[31] Spatio-temporal Matching for Human Detection in Video
Zhou, Feng
De la Torre, Fernando
COMPUTER VISION - ECCV 2014, PT VI, 2014, 8694 : 62 - 77
[32] Video action detection by learning graph-based spatio-temporal interactions
Tomei, Matteo
Baraldi, Lorenzo
Calderara, Simone
Bronzin, Simone
Cucchiara, Rita
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
[33] Spatio-temporal detection of video moving object
Ren, Ming-Yi
Li, Xiao-Feng
Li, Zai-Ming
Guangdianzi Jiguang/Journal of Optoelectronics Laser, 2009, 20 (07): : 911 - 915
[34] Spatio-temporal adaptive convolution and bidirectional motion difference fusion for video action recognition
Li, Linxi
Tang, Mingwei
Yang, Zhendong
Hu, Jie
Zhao, Mingfeng
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[35] Spatio-temporal aggregation of skeletal motion features for human motion prediction
Ueda, Itsuki
Shishido, Hidehiko
Kitahara, Itaru
ARRAY, 2022, 15
[36] Spatio-temporal Video Autoencoder for Human Action Recognition
Sousa e Santos, Anderson Carlos
Pedrini, Helio
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
[37] Efficient spatio-temporal network for action recognition
Su, Yanxiong
Zhao, Qian
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
[38] Exploiting spatio-temporal knowledge for video action recognition
Zhang, Huigang
Wang, Liuan
Sun, Jun
IET COMPUTER VISION, 2023, 17 (02) : 222 - 230
[39] Interpretable Spatio-temporal Attention for Video Action Recognition
Meng, Lili
Zhao, Bo
Chang, Bo
Huang, Gao
Sun, Wei
Tung, Frederich
Sigal, Leonid
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
[40] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Sun, Yanan
Wang, Guanzhi
Gu, Qiao
Tang, Chi-Keung
Tai, Yu-Wing
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980

← 1 2 3 4 5 →