Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern

被引:2
|
作者
Huang, Jingjia [1 ]
Li, Nannan [2 ]
Zhong, Jiaxing [1 ]
Li, Thomas H. [3 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Beijing, Peoples R China
[2] Peking Univ, Shenzhen Grad Sch, Beijing, Peoples R China
[3] Gpower Semicond Inc, Suzhou, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Spatio-temporal action detection; encoder-decoder model; online action tune generation;
D O I
10.1145/3240508.3240659
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
At present, spatio-temporal action detection in the video is still a challenging problem, considering the complexity of the background, the variety of the action or the change of the viewpoint in the unconstrained environment. Most of current approaches solve the problem via a two-step processing: first detecting actions at each frame; then linking them, which neglects the continuity of the action and operates in an offline and batch processing manner. In this paper, we attempt to build an online action detection model that introduces the spatio-temporal coherence existed among action regions when performing action category inference and position localization. Specifically, we seek to represent the spatio-temporal context pattern via establishing an encoder-decoder model based on the convolutional recurrent network. The model accepts a video snippet as input and encodes the dynamic information of the action in the forward pass. During the backward pass, it resolves such information at each time instant for action detection via fusing the current static or motion cue. Additionally, we propose an incremental action tube generation algorithm, which accomplishes action bounding-boxes association, action label determination and the temporal trimming in a single pass. Our model takes in the appearance, motion or fused signals as input and is tested on two prevailing datasets, UCF-Sports and UCF-101. The experiment results demonstrate the effectiveness of our method which achieves a performance superior or comparable to compared existing approaches.
引用
收藏
页码:993 / 1001
页数:9
相关论文
共 50 条
  • [1] Online Spatio-temporal Action Detection for Eldercare
    Koh, Thean Chun
    Yeo, Chai Kiat
    Jing, Xuan
    [J]. 2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 126 - 127
  • [2] Online Convolutional Network Tracking via Spatio-Temporal Context
    Liu P.
    Wang H.
    Luo Y.
    Du Y.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (12): : 2785 - 2793
  • [3] Online convolution network tracking via spatio-temporal context
    Wang, Hongxiang
    Liu, Peizhong
    Du, Yongzhao
    Liu, Xiaofang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 257 - 270
  • [4] Online convolution network tracking via spatio-temporal context
    Hongxiang Wang
    Peizhong Liu
    Yongzhao Du
    Xiaofang Liu
    [J]. Multimedia Tools and Applications, 2019, 78 : 257 - 270
  • [5] Online spatio-temporal action detection with adaptive sampling and hierarchical modulation
    Su, Shaowen
    Gan, Minggang
    [J]. Multimedia Systems, 2024, 30 (06)
  • [6] Joint Motion Context and Clip Augmentation for Spatio-temporal Action Detection
    Ma, Xurui
    Zhang, Xiang
    Wu, Chengkun
    Xu, Chuanfu
    Liu, Jie
    Luo, Zhigang
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [7] Conversation Group Detection With Spatio-Temporal Context
    Tan, Stephanie
    Tax, David M. J.
    Hung, Hayley
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 170 - 180
  • [8] TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
    Song, Lin
    Zhang, Shiwei
    Yu, Gang
    Sun, Hongbin
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11979 - 11987
  • [9] Projection transform on spatio-temporal context for action recognition
    Wanru Xu
    Zhenjiang Miao
    Qiang Zhang
    [J]. Multimedia Tools and Applications, 2015, 74 : 7711 - 7728
  • [10] Clustered Spatio-Temporal Manifolds for Online Action Recognition
    Bloom, Victoria
    Makris, Dimitrios
    Argyriou, Vasileios
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3963 - 3968