Real-Time Action Detection Based on Spatio-Temporal Interaction Perception

被引:0
|
作者
Ke, Xiao [1 ,2 ,3 ]
Miao, Xin [1 ,2 ,3 ]
Guo, Wen-Zhong [1 ,2 ,3 ]
机构
[1] College of Computer and Data Science, Fuzhou University, Fujian, Fuzhou,350116, China
[2] Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, Fuzhou University, Fujian, Fuzhou,350116, China
[3] Key Laboratory of Spatial Data Mining & Information Sharing, Ministry of Education, Fujian, Fuzhou,350003, China
来源
基金
中国国家自然科学基金;
关键词
D O I
10.12263/DZXB.20220859
中图分类号
O43 [光学]; T [工业技术];
学科分类号
070207 ; 08 ; 0803 ;
摘要
Spatiotemporal action detection requires incorporation of video spatial and temporal information. Current state‑of‑the‑art approaches usually use a 2D CNN (Convolutionsl Neural Networks) or a 3D CNN architecture. However, due to the complexity of network structure and spatiotemporal information extraction, these methods are usually non‑real‑ time and offline. To solve this problem, this paper proposes a real‑time action detection method based on spatiotemporal interaction perception. First of all, the input video is rearranged out of order to enhance the temporal information. As 2D or 3D backbone networks cannot be used to model spatiotemporal features effectively, a multi‑branch feature extraction network is proposed to extract features from different sources. And a multi‑scale attention network is proposed to extract long‑ term time‑dependent and spatial context information. Then, for the fusion of temporal and spatial features from two different sources, a new motion saliency enhancement fusion strategy is proposed, which guides the fusion between features by encoding temporal and spatial features to highlight more discriminative spatiotemporal features. Finally, action tube links are generated online based on the frame‑level detector results. The proposed method achieves an accuracy of 84.71% and 78.4% on two spatiotemporal motion datasets UCF101‑24 and JHMDB‑21. And it provides a speed of 73 frames per second, which is superior to the state‑of‑the‑art methods. In addition, for the problems of high inter‑class similarity and easy confusion of difficult sample data in the JHMDB‑21 dataset, this paper proposes an action detection method of key frame optical flow based on action representation, which avoids the generation of redundant optical flow and further improves the accuracy of action detection. © 2024 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:574 / 588
相关论文
共 50 条
  • [1] Cascading spatio-temporal attention network for real-time action detection
    Yang, Jianhua
    Wang, Ke
    Li, Ruifeng
    Perner, Petra
    [J]. MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
  • [2] Cascading spatio-temporal attention network for real-time action detection
    Jianhua Yang
    Ke Wang
    Ruifeng Li
    Petra Perner
    [J]. Machine Vision and Applications, 2023, 34
  • [3] Real-time Online Action Detection Forests using Spatio-temporal Contexts
    Baek, Seungryul
    Kim, Kwang In
    Kim, Tae-Kyun
    [J]. 2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, : 158 - 167
  • [4] Real-time Spatio-Temporal Action Localization in 360 Videos
    Chen, Bo
    Ali-Eldin, Ahmed
    Shenoy, Prashant
    Nahrsted, Klara
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 73 - 76
  • [5] Learning motion representation for real-time spatio-temporal action localization
    Zhang, Dejun
    He, Linchao
    Tu, Zhigang
    Zhang, Shifu
    Han, Fei
    Yang, Boxiong
    [J]. Pattern Recognition, 2020, 103
  • [6] Learning motion representation for real-time spatio-temporal action localization
    Zhang, Dejun
    He, Linchao
    Tu, Zhigang
    Zhang, Shifu
    Han, Fei
    Yang, Boxiong
    [J]. PATTERN RECOGNITION, 2020, 103
  • [7] Robust spatio-temporal descriptors for real-time SVM-based fall detection
    Charfi, Imen
    Miteran, Johel
    Dubois, Julien
    Heyrman, Barthelemy
    Atri, Mohamed
    [J]. 2014 WORLD SYMPOSIUM ON COMPUTER APPLICATIONS & RESEARCH (WSCAR), 2014,
  • [8] Spatio-temporal view interpolation in real-time
    Radtke, T
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 1939 - 1946
  • [9] Real-time spatio-temporal event detection on geotagged social media
    George, Yasmeen
    Karunasekera, Shanika
    Harwood, Aaron
    Lim, Kwan Hui
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [10] Real-time spatio-temporal event detection on geotagged social media
    Yasmeen George
    Shanika Karunasekera
    Aaron Harwood
    Kwan Hui Lim
    [J]. Journal of Big Data, 8