STAN: Spatial-Temporal Awareness Network for Temporal Action Detection

被引:0
|
作者
Liu, Minghao [1 ]
Liu, Haiyi [1 ]
Zhao, Sirui [1 ]
Ma, Fei [2 ]
Li, Minglei [2 ]
Dai, Zonghong [2 ]
Wang, Hao [1 ]
Xu, Tong [1 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China USTC, Hefei, Anhui, Peoples R China
[2] Huawei Cloud Comp Technol Ltd, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal action detection; data imbalance; Bi-LSTM;
D O I
10.1145/3606038.3616169
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, there have been significant advancements in the field of temporal action detection. However, few studies have focused on detecting actions in sporting events. In this context, the MMSports 2023 cricket bowl release challenge aims to identify the bowl release action by segmenting untrimmed videos. To achieve this, we propose a novel cricket bowl release detection framework based on Spatial-Temporal Awareness Network (STAN) which mainly consists of three modules: the spatial feature extraction module (SFEM), the temporal feature extraction module (TFEM), and the classification module (CM). Specifically, we first adopt ResNet to extract the spatial features from videos in SFEM. Then, the TFEM is designed to aggregate temporal features using Bi-LSTM to obtain spatial-temporal features. Afterward, the CM converts the spatial-temporal features into action category probabilities to localize the action segments. Besides, we introduce the weighted binary cross entropy loss to solve the data imbalance problem in cricket bowl release detection. Finally, the experiments show that our proposed STAN achieves competitive performance in 1st place with a PQ score of 0.643 on the cricket bowl release challenge. The code is also publicly available at https://github.com/lmhr/STAN.
引用
收藏
页码:161 / 165
页数:5
相关论文
共 50 条
  • [1] R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition
    Liu, Quanle
    Che, Xiangjiu
    Bie, Mei
    [J]. IEEE ACCESS, 2019, 7 : 82246 - 82255
  • [2] RGB-Skeleton Fusion Network For Spatial-Temporal Action Detection
    Pan, Binbin
    Wang, Wenzhong
    Luo, Bin
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
  • [3] A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization
    Gao, Zan
    Cui, Xinglei
    Zhuo, Tao
    Cheng, Zhiyong
    Liu, An-An
    Wang, Meng
    Chen, Shenyong
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) : 569 - 580
  • [4] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    [J]. Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281
  • [5] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    [J]. Computer Engineering and Applications, 2023, 59 (09): : 150 - 158
  • [6] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024,
  • [7] Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
    Tan, Chenwei
    Sun, Tao
    Fu, Talas
    Wang, Yuhan
    Xu, Minjie
    Liu, Shenglan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 28 - 39
  • [8] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [9] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Zhang, Zhao
    Liu, Peng
    Tang, Xianglong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44273 - 44297
  • [10] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Zhao Zhang
    Peng Liu
    Xianglong Tang
    [J]. Multimedia Tools and Applications, 2024, 83 : 44273 - 44297