Action detection with two-stream enhanced detector

被引:3
|
作者
Zhang, Min [1 ]
Hu, Haiyang [1 ]
Li, Zhongjin [1 ]
Chen, Jie [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
来源
VISUAL COMPUTER | 2023年 / 39卷 / 03期
基金
中国国家自然科学基金;
关键词
Action detection; Spatiotemporal localization; Object detection; Anchor cuboid; ATTENTION;
D O I
10.1007/s00371-021-02397-8
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Action understanding in videos is a challenging task that has attracted widespread attention in recent years. Most current methods localize bounding box of actors at frame level, and then track or link these detections to form action tubes across frames. These methods often focus on utilizing temporal context in videos while neglecting the importance of the detector itself. In this paper, we present a two-stream enhanced framework to deal with the problem of action detection. Specifically, we devise an appearance and motion detectors in two-stream manner to detect actions, which take k consecutive RGB frames and optical flow images as input respectively. To improve the feature presentation capabilities, anchor refinement sub-module with feature alignment is introduced into the two-stream architecture to generate flexible anchor cuboids. Meanwhile, hierarchical fusion strategy is utilized to concatenate intermediate feature maps for capturing fast moving subjects. Moreover, layer normalization with skip connection is adopted to reduce the internal co-variate shift between network layers, which makes the training process simple and effective. Compared to state-of-the-art methods, the proposed approach yields impressive performance gain on three prevailing datasets: UCF-Sports, UCF-101 and J-HMDB, which confirm the effectiveness of our enhanced detector for action detection.
引用
收藏
页码:1193 / 1204
页数:12
相关论文
共 50 条
  • [1] Action detection with two-stream enhanced detector
    Min Zhang
    Haiyang Hu
    Zhongjin Li
    Jie Chen
    The Visual Computer, 2023, 39 : 1193 - 1204
  • [2] Action detection based on tracklets with the two-stream CNN
    Minwen Zhang
    Chenqiang Gao
    Qiang Li
    Lan Wang
    Jiayao Zhang
    Multimedia Tools and Applications, 2018, 77 : 3303 - 3316
  • [3] Action detection based on tracklets with the two-stream CNN
    Zhang, Minwen
    Gao, Chenqiang
    Li, Qiang
    Wang, Lan
    Zhang, Jiayao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3303 - 3316
  • [4] Two-Stream Completeness Modeling for Weakly Supervised Temporal Action Detection
    Chen, Xiaoqiu
    Li, Mengge
    Ma, Miao
    IEEE TALE2021: IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND EDUCATION, 2021, : 823 - 828
  • [5] Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition
    Khan, Shahbaz
    Hassan, Ali
    Hussain, Farhan
    Perwaiz, Aqib
    Riaz, Farhan
    Alsabaan, Maazen
    Abdul, Wadood
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [6] Efficient Two-stream Action Recognition on FPGA
    Lin, Jia-Ming
    Lai, Kuan-Ting
    Wu, Bin-Ray
    Chen, Ming-Syan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3070 - 3074
  • [7] Fuzzy Fusion for Two-stream Action Recognition
    Sousa e Santos, Anderson Carlos
    Maia, Helena de Almeida
    Roberto e Souza, Marcos
    Vieira, Marcelo Bernardes
    Pedrini, Helio
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 117 - 123
  • [8] Multi-region Two-Stream R-CNN for Action Detection
    Peng, Xiaojiang
    Schmid, Cordelia
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 744 - 759
  • [9] A Two-Stream Approach to Fall Detection With MobileVGG
    Han, Qing
    Zhao, Haoyu
    Min, Weidong
    Cui, Hao
    Zhou, Xiang
    Zuo, Ke
    Liu, Ruikang
    IEEE ACCESS, 2020, 8 : 17556 - 17566
  • [10] Two-stream Deep Representation for Human Action Recognition
    Ghrab, Najla Bouarada
    Fendri, Emna
    Hammami, Mohamed
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084