Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition

被引:1
|
作者
Xia, Limin [1 ]
Fu, Weiye [1 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Two-stream network; Attention mechanism; Multiscale features;
D O I
10.1007/s10586-024-04553-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition is one of the most challenging tasks in computer vision due to background noise interference and video frame redundancy. Therefore, we propose a two-stream Convolutional Neural Network based on Spatial-Temporal Multiscale Feature Optimization (ST-MFO). Specifically, multiscale features generated by a pyramid pooling network are combined with improved coordinate attention, which results in richer feature representation and reduces background noise interference. Meanwhile, we introduce density peak clustering based on a nonlinear kernel function, which can extract more representative key frames. To improve classification efficiency, we also assign varying degrees of attention to key frames through temporal attention. In addition, we propose an attention-based spatial-temporal information interaction module that optimizes temporal and spatial features with complementarity between temporal and spatial information. Experimental results on four benchmark video datasets show that ST-MFO achieves comparable or better performance than state-of-the-art methods.
引用
收藏
页码:11611 / 11626
页数:16
相关论文
共 50 条
  • [21] Human Abnormal Behavior Recognition Based on Two-Stream Convolutional Neural Network
    Yi, Qiao
    INTERNATIONAL CONFERENCE ON SENSORS AND INSTRUMENTS (ICSI 2021), 2021, 11887
  • [22] Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition
    Jin-Gong Jia
    Yuan-Feng Zhou
    Xing-Wei Hao
    Feng Li
    Christian Desrosiers
    Cai-Ming Zhang
    Journal of Computer Science and Technology, 2020, 35 : 538 - 550
  • [23] Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition
    Jia, Jin-Gong
    Zhou, Yuan-Feng
    Hao, Xing-Wei
    Li, Feng
    Desrosiers, Christian
    Zhang, Cai-Ming
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (03) : 538 - 550
  • [24] A Novel Motion Recognition Method Based on Improved Two-stream Convolutional Neural Network and Sparse Feature Fusion
    Chen, Chen
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2022, 19 (03) : 1329 - 1348
  • [25] Two-Stream Spatial-Temporal Graph Convolutional Networks for Driver Drowsiness Detection
    Bai, Jing
    Yu, Wentao
    Xiao, Zhu
    Havyarimana, Vincent
    Regan, Amelia C.
    Jiang, Hongbo
    Jiao, Licheng
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13821 - 13833
  • [26] Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection
    Mengyao Zhao
    Zhengping Hu
    Shufang Li
    Shuai Bi
    Zhe Sun
    Signal, Image and Video Processing, 2022, 16 : 947 - 954
  • [27] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
    Mustaqeem
    Kwon, Soonil
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135
  • [28] Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network
    Qian H.
    Chen S.
    Huangfu X.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1100 - 1108
  • [29] An Improved Two-stream 3D Convolutional Neural Network for Human Action Recognition
    Chen, Jun
    Xu, Yuanping
    Zhang, Chaolong
    Xu, Zhijie
    Meng, Xiangxiang
    Wang, Jie
    2019 25TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), 2019, : 135 - 140
  • [30] Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection
    Zhao, Mengyao
    Hu, Zhengping
    Li, Shufang
    Bi, Shuai
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (04) : 947 - 954