SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引:3
|
作者
Zhang, Hongcheng [1 ]
Zhao, Xu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
关键词
video understanding; video action detection; spatio-temporal action detection; anchor-free detector;
D O I
10.1109/ICASSP43922.2022.9746817
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.
引用
收藏
页码:2180 / 2184
页数:5
相关论文
共 50 条
  • [31] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
  • [32] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [33] Exploiting spatio-temporal knowledge for video action recognition
    Zhang, Huigang
    Wang, Liuan
    Sun, Jun
    [J]. IET COMPUTER VISION, 2023, 17 (02) : 222 - 230
  • [34] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [35] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
    Sun, Yanan
    Wang, Guanzhi
    Gu, Qiao
    Tang, Chi-Keung
    Tai, Yu-Wing
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
  • [36] Online Spatio-temporal Action Detection for Eldercare
    Koh, Thean Chun
    Yeo, Chai Kiat
    Jing, Xuan
    [J]. 2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 126 - 127
  • [37] ACTION RECOGNITION USING SPATIO-TEMPORAL DIFFERENTIAL MOTION
    Yadav, Gaurav Kumar
    Sethi, Amit
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3415 - 3419
  • [38] Robot Motion Planning as Video Prediction: A Spatio-Temporal Neural Network-based Motion Planner
    Zang, Xiao
    Yin, Miao
    Huang, Lingyi
    Yu, Jingjin
    Zonouz, Saman
    Yuan, Bo
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 12492 - 12499
  • [39] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    Sowmyayani, S.
    Rani, P. Arockia Jansi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 38051 - 38066
  • [40] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    S. Sowmyayani
    P. Arockia Jansi Rani
    [J]. Multimedia Tools and Applications, 2023, 82 : 38051 - 38066