Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos

被引:1
|
作者
Wu, Yutang [1 ,2 ]
Wang, Hanli [1 ,2 ]
Li, Qinyu [1 ,3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 200092, Peoples R China
[3] Lanzhou City Univ, Dept Comp Sci, Lanzhou 730070, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
Human action detection; Three-stream architecture; Action tubelet detector; Pose stream;
D O I
10.1007/978-3-030-00767-6_28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, human action detection in videos has gained wide attention. Instead of detection frame by frame, a model named action tubelet (ACT) detector detects human actions sequence by sequence and achieves remarkable performances on both accuracy and speed in the form of two streams. In this work, a three-stream action tubelet detector (three-stream ACT detector) is proposed which adds an extra pose stream to obtain more information about human actions and fuses three streams by weighted average compared to the two-stream architecture. The experimental results on the benchmark UCF-Sports, J-HMDB and UCF-101 datasets demonstrate that the proposed threestream ACT detector framework is able to boost the performance of human action detection.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [41] Anomaly Detection for Spatiotemporal Data in Action
    Yang, Guang
    Kulkarni, Ninad
    Dua, Paavani
    Khullar, Dipika
    Chirayath, Alex Anto
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4844 - 4845
  • [42] Two-Stream Convolutional Networks for Action Recognition in Videos
    Simonyan, Karen
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [43] 3 s-STNet: three-stream spatial-temporal network with appearance and skeleton information learning for action recognition
    Fang, Ming
    Peng, Siyu
    Zhao, Yang
    Yuan, Haibo
    Hung, Chih-Cheng
    Liu, Shuhua
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1835 - 1848
  • [44] COWO: towards real-time spatiotemporal action localization in videos
    Yi, Yang
    Sun, Yang
    Yuan, Saimei
    Zhu, Yiji
    Zhang, Mengyi
    Zhu, Wenjun
    ASSEMBLY AUTOMATION, 2022, 42 (02) : 202 - 208
  • [45] A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition
    Chen, Enqing
    Bai, Xue
    Gao, Lei
    Tinega, Haron Chweya
    Ding, Yingqiang
    IEEE ACCESS, 2019, 7 : 57267 - 57275
  • [46] Two-stream spatiotemporal networks for skeleton action recognition
    Wang, Lei
    Zhang, Jianwei
    Yang, Shanmin
    Gu, Song
    IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
  • [47] Three-stream interaction decoder network for RGB-thermal salient object detection
    Huo, Fushuo
    Zhu, Xuegui
    Li, Bingheng
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [48] Three-stream network with context convolution module for human-object interaction detection
    Siadari, Thomhert S.
    Han, Mikyong
    Yoon, Hyunjin
    ETRI JOURNAL, 2020, 42 (02) : 230 - 238
  • [49] Spatiotemporal Deformable Part Models for Action Detection
    Tian, Yicong
    Sukthankar, Rahul
    Shah, Mubarak
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2642 - 2649
  • [50] Online human action detection and anticipation in videos: A survey
    Hu, Xuejiao
    Dai, Jingzhao
    Li, Ming
    Peng, Chenglei
    Li, Yang
    Du, Sidan
    Neurocomputing, 2022, 491 : 395 - 413