Spatiotemporal Action Detection Using 2D CNN and 3D CNN

被引:0
|
作者
Liu, Hengshuai [1 ]
Li, Jianjun [1 ]
Tang, Yuhong [1 ]
Zhang, Ningfei [1 ]
Zhang, Ming [1 ]
Wang, Yaping [1 ]
Li, Guang [1 ]
机构
[1] Inner Mongolia Univ Sci & Technol, Sch Digital & Intelligent Ind, Baotou 014000, Inner Mongolia, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatiotemporal action detection; CNN; Convolution and attention mechanisms;
D O I
10.1016/j.compeleceng.2024.109739
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In order to address the low accuracy issue in human spatiotemporal action detection tasks, this study proposes a more effective CNN framework. Like YOWO model, we also use CNN for feature extraction, however, we only utilize the extracted spatiotemporal features for action recognition and the fused features of spatiotemporal and spatial information for action localization. Additionally, in the action localization branch, we make improvements to the original channel fusion and attention mechanism (CFAM). We introduce a combination of convolution and attention mechanisms to selectively replace the traditional convolutions, enabling more effective utilization of the fused features. Finally, in order to make the model more accurate for bounding box regression, we use CIoU loss instead of the offset loss. Results show that our proposed method achieves frame-mAP scores (@IoU 0.5) of 75.73 % and 83.13 % on JHMDB-21 and UCF101-24 datasets, respectively. For video-mAP, we obtain 88.96 %, 85.81 % and 68.59 % at IoU threshold of 0.2,0.5 and 0.75 on JHMDB-21 dataset and 75.05 %, 69.72 % and 48.95 % at IoU threshold of 0.1,0.2 and 0.5 on UCF101-24 dataset.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Polyp detection on video colonoscopy using a hybrid 2D/3D CNN
    Puyal, Juana Gonzalez-Bueno
    Brandao, Patrick
    Ahmad, Omer F.
    Bhatia, Kanwal K.
    Toth, Daniel
    Kader, Rawen
    Lovat, Laurence
    Mountney, Peter
    Stoyanov, Danail
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [2] Hyperspectral image classification using 3D 2D CNN
    Diakite, Alou
    Gui, Jiangsheng
    Fu, Xiaping
    IET IMAGE PROCESSING, 2021, 15 (05) : 1083 - 1092
  • [3] Action Localization Using 2D-CNN and 3D-CNN Collaboration
    Tong, Jiale
    Li, Jianjun
    Zhang, Ming
    Zhang, Baohua
    IEEE Access, 2022, 10 : 77658 - 77667
  • [4] Action Localization Using 2D-CNN and 3D-CNN Collaboration
    Tong, Jiale
    Li, Jianjun
    Zhang, Ming
    Zhang, Baohua
    IEEE ACCESS, 2022, 10 : 77658 - 77667
  • [5] A 3D CNN APPROACH FOR CHANGE DETECTION IN HR SATELLITE IMAGE TIME SERIES BASED ON A PRETRAINED 2D CNN
    Meshkini, K.
    Bovolo, F.
    Bruzzone, L.
    XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION III, 2022, 43-B3 : 143 - 150
  • [6] Infant Brain Age Classification: 2D CNN Outperforms 3D CNN in Small Dataset
    Shabanian, Mahdieh
    Wenzel, Markus
    DeVincenzo, John P.
    MEDICAL IMAGING 2022: IMAGE PROCESSING, 2022, 12032
  • [7] Detection of Fake 3D Video Using CNN
    Rana, Shuvendu
    Gaj, Sibaji
    Sur, Arijit
    Bora, Prabin Kumar
    2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
  • [8] Aggressive action recognition using 3D CNN architectures
    Saveliev, Anton
    Uzdiaev, Mikhail
    Dmitrii, Malov
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 890 - 895
  • [9] 3D CNN for Human Action Recognition
    Boualia, Sameh Neili
    Ben Amara, Najoua Essoukri
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 276 - 282
  • [10] 2D CNN versus 3D CNN for false-positive reduction in lung cancer screening
    Yu, Juezhao
    Yang, Bohan
    Wang, Jing
    Leader, Joseph
    Wilson, David
    Pu, Jiantao
    JOURNAL OF MEDICAL IMAGING, 2020, 7 (05)