SRI3D: Two-stream inflated 3D ConvNet based on sparse regularization for action recognition

被引:2
|
作者
Yang, Zhaoqilin [1 ]
An, Gaoyun [1 ,3 ]
Zhang, Ruichen [2 ]
Zheng, Zhenxing [1 ]
Ruan, Qiuqi [1 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing, Peoples R China
[3] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; convolutional neural nets; neural nets; video signal processing; NETWORKS;
D O I
10.1049/ipr2.12725
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although most state-of-the-art action recognition models have adopted a two-stream 3D convolutional structure as a backbone network, few works have studied the impact of loss functions on action recognition models. In addition, sparsity is used as a key prior knowledge in many fields. However, as far as is known, no one has studied the influence of the sparsity of network output on the output of deep learning-based action recognition models. Therefore, this paper proposes a novel two-stream inflated 3D ConvNet based on the sparse regularization (SRI3D) model for action recognition. In order to allow the network to learn the sparsity of output, the l(1) norm is embedded in the loss function in regularization form in a plug-and-play manner. It can make the classification result after the fusion of the two-stream network only be the category with the highest confidence in one of the streams and not the other cases. The proposed loss function based on sparse regularization makes the output vector of the neural network as sparse as possible so that the classification results will not be ambiguous. Experimental results show that compared with other state-of-the-art models, this SRI3D has a competitive advantage on Kinetics-400, Something-Something V2, UCF-101 and HMDB-51.
引用
收藏
页码:1438 / 1448
页数:11
相关论文
共 50 条
  • [41] A three-stream fusion network for 3D skeleton-based action recognition
    Fang, Ming
    Liu, Qi
    Ren, Jianping
    Li, Jie
    Du, Xinning
    Liu, Shuhua
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [42] Sparse regularization-based reconstruction for 3D flame chemiluminescence tomography
    Jin, Ying
    Guo, Zhenyan
    Song, Yang
    Li, Zhenhua
    He, Anzhi
    Situ, Guohai
    APPLIED OPTICS, 2021, 60 (03) : 513 - 525
  • [43] Dense and Sparse 3D Deformation Signatures for 3D Dynamic Face Recognition
    Shabayek, Abd El Rahman
    Aouada, Djamila
    IEEE ACCESS, 2021, 9 : 38687 - 38705
  • [44] Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition
    Azary, Sherif
    Savakis, Andreas
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 492 - 499
  • [45] LEARNING GEOMETRIC FEATURES WITH DUAL - STREAM CNN FOR 3D ACTION RECOGNITION
    Thien Huynh-The
    Hua, Cam-Hao
    Nguyen Anh Tu
    Kim, Dong-Seong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2353 - 2357
  • [46] Violence Detection With Two-Stream Neural Network Based on C3D
    Lu, Zanzan
    Xia, Xuewen
    Wu, Hongrun
    Yang, Chen
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [47] A Two-Stream 3D-CNN Network Based on Pressure Sensor Data and Its Application in Gait Recognition
    Hu, Chunfen
    Huan, Zhan
    Dong, Chenhui
    ELECTRONICS, 2023, 12 (18)
  • [48] Hollywood 3D: What are the Best 3D Features for Action Recognition?
    Simon Hadfield
    Karel Lebeda
    Richard Bowden
    International Journal of Computer Vision, 2017, 121 : 95 - 110
  • [49] 3D RANs: 3D Residual Attention Networks for action recognition
    Cai, Jiahui
    Hu, Jianguo
    VISUAL COMPUTER, 2020, 36 (06): : 1261 - 1270
  • [50] Hollywood 3D: What are the Best 3D Features for Action Recognition?
    Hadfield, Simon
    Lebeda, Karel
    Bowden, Richard
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 121 (01) : 95 - 110