Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos

被引:1
|
作者
Wu, Yutang [1 ,2 ]
Wang, Hanli [1 ,2 ]
Li, Qinyu [1 ,3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 200092, Peoples R China
[3] Lanzhou City Univ, Dept Comp Sci, Lanzhou 730070, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
Human action detection; Three-stream architecture; Action tubelet detector; Pose stream;
D O I
10.1007/978-3-030-00767-6_28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, human action detection in videos has gained wide attention. Instead of detection frame by frame, a model named action tubelet (ACT) detector detects human actions sequence by sequence and achieves remarkable performances on both accuracy and speed in the form of two streams. In this work, a three-stream action tubelet detector (three-stream ACT detector) is proposed which adds an extra pose stream to obtain more information about human actions and fuses three streams by weighted average compared to the two-stream architecture. The experimental results on the benchmark UCF-Sports, J-HMDB and UCF-101 datasets demonstrate that the proposed threestream ACT detector framework is able to boost the performance of human action detection.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [21] TQRFormer: Tubelet query recollection transformer for action detection
    Wang, Xiangyang
    Yang, Kun
    Ding, Qiang
    Wang, Rui
    Sun, Jinhua
    IMAGE AND VISION COMPUTING, 2024, 147
  • [22] Recurrent Tubelet Proposal and Recognition Networks for Action Detection
    Li, Dong
    Qiu, Zhaofan
    Dai, Qi
    Yao, Ting
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 306 - 322
  • [23] An Improved Attention-Based Spatiotemporal-Stream Model for Action Recognition in Videos
    Liu, Dan
    Ji, Yunfeng
    Ye, Mao
    Gan, Yan
    Zhang, Jianwei
    IEEE ACCESS, 2020, 8 : 61462 - 61470
  • [24] Three-Stream Network With Bidirectional Self-Attention for Action Recognition in Extreme Low Resolution Videos (vol 26, pg 1187, 2019)
    Purwanto, Didik
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2188 - 2188
  • [25] Multi-level Three-Stream Convolutional Networks for Video-Based Action Recognition
    Lv, Yijing
    Zheng, Huicheng
    Zhang, Wei
    PATTERN RECOGNITION AND COMPUTER VISION, PT II, 2018, 11257 : 237 - 249
  • [26] Violence detection based on three-stream convolutional networks
    Cheng Yunfei
    Wang Wu
    Liu Yuexia
    Man Keshuang
    GLOBAL INTELLIGENCE INDUSTRY CONFERENCE (GIIC 2018), 2018, 10835
  • [27] A three-stream fusion network for 3D skeleton-based action recognition
    Fang, Ming
    Liu, Qi
    Ren, Jianping
    Li, Jie
    Du, Xinning
    Liu, Shuhua
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [28] A three-stream fusion network for 3D skeleton-based action recognitionA three-stream fusion network for 3D skeleton-based action recognitionM. Fang et al.
    Ming Fang
    Qi Liu
    Jianping Ren
    Jie Li
    Xinning Du
    Shuhua Liu
    Multimedia Systems, 2025, 31 (3)
  • [29] Three-stream spatio-temporal attention network for first-person action and interaction recognition
    Imran, Javed
    Raman, Balasubramanian
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (02) : 1137 - 1152
  • [30] Learning a strong detector for action localization in videos
    Zhang, Yongqiang
    Ding, Mingli
    Bai, Yancheng
    Liu, Dandan
    Ghanem, Bernard
    PATTERN RECOGNITION LETTERS, 2019, 128 : 407 - 413