SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引:3
|
作者
Zhang, Hongcheng [1 ]
Zhao, Xu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
关键词
video understanding; video action detection; spatio-temporal action detection; anchor-free detector;
D O I
10.1109/ICASSP43922.2022.9746817
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.
引用
收藏
页码:2180 / 2184
页数:5
相关论文
共 50 条
  • [1] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Lu, Xuemin
    Quan, Wei
    Marek, Reformat
    Zhao, Haiquan
    Chen, Jim X. X.
    [J]. VISUAL COMPUTER, 2024, 40 (05): : 3163 - 3181
  • [2] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Xuemin Lu
    Wei Quan
    Reformat Marek
    Haiquan Zhao
    Jim X. Chen
    [J]. The Visual Computer, 2024, 40 : 3163 - 3181
  • [3] Spatio-Temporal Action Detection Under Large Motion
    Singh, Gurkirt
    Choutas, Vasileios
    Saha, Suman
    Yu, Fisher
    Van Gool, Luc
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5998 - 6007
  • [4] Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation
    Kim, Jaekyum
    Koh, Junho
    Lee, Byeongwon
    Yang, Seungji
    Choi, Jun Won
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1604 - 1610
  • [5] ENHANCED ACTION TUBELET DETECTOR FOR SPATIO-TEMPORAL VIDEO ACTION DETECTION
    Wu, Yutang
    Wang, Hanli
    Wang, Shuheng
    Li, Qinyu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2388 - 2392
  • [6] STEP: Spatio-Temporal Progressive Learning for Video Action Detection
    Yang, Xitong
    Yang, Xiaodong
    Liu, Ming-Yu
    Xiao, Fanyi
    Davis, Larry
    Kautz, Jan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 264 - 272
  • [7] Spatio-temporal prediction and reconstruction network for video anomaly detection
    Liu, Ting
    Zhang, Chengqing
    Niu, Xiaodong
    Wang, Liming
    [J]. PLOS ONE, 2022, 17 (05):
  • [8] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
    Zhu, Hongyuan
    Vial, Romain
    Lu, Shijian
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5814 - 5822
  • [9] Video spatio-temporal generative adversarial network for local action generation
    Liu, Xuejun
    Guo, Jiacheng
    Cui, Zhongji
    Liu, Ling
    Yan, Yong
    Sha, Yun
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [10] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    [J]. ENTROPY, 2022, 24 (03)