Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition

被引:1
|
作者
Xia, Limin [1 ]
Fu, Weiye [1 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Two-stream network; Attention mechanism; Multiscale features;
D O I
10.1007/s10586-024-04553-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition is one of the most challenging tasks in computer vision due to background noise interference and video frame redundancy. Therefore, we propose a two-stream Convolutional Neural Network based on Spatial-Temporal Multiscale Feature Optimization (ST-MFO). Specifically, multiscale features generated by a pyramid pooling network are combined with improved coordinate attention, which results in richer feature representation and reduces background noise interference. Meanwhile, we introduce density peak clustering based on a nonlinear kernel function, which can extract more representative key frames. To improve classification efficiency, we also assign varying degrees of attention to key frames through temporal attention. In addition, we propose an attention-based spatial-temporal information interaction module that optimizes temporal and spatial features with complementarity between temporal and spatial information. Experimental results on four benchmark video datasets show that ST-MFO achieves comparable or better performance than state-of-the-art methods.
引用
收藏
页码:11611 / 11626
页数:16
相关论文
共 50 条
  • [41] Hidden Two-Stream Convolutional Networks for Action Recognition
    Zhu, Yi
    Lan, Zhenzhong
    Newsam, Shawn
    Hauptmann, Alexander
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
  • [42] Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition
    Deng, Long
    Li, Ao
    Zhou, Bingxin
    Ge, Yongxin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2435 - 2439
  • [43] Two-Stream Convolutional Networks for Action Recognition in Videos
    Simonyan, Karen
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [44] Thermal infrared action recognition with two-stream shift Graph Convolutional Network
    Liu, Jishi
    Wang, Huanyu
    Wang, Junnian
    He, Dalin
    Xu, Ruihan
    Tang, Xiongfeng
    MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
  • [45] 3D Convolutional Two-Stream Network for Action Recognition in Videos
    Li, Min
    Qi, Yuezhu
    Yang, Jian
    Zhang, Yanfang
    Ren, Junxing
    Du, Hong
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1697 - 1701
  • [46] Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition
    Yu, Lubin
    Tian, Lianfang
    Du, Qiliang
    Bhutto, Jameel Ahmed
    IET COMPUTER VISION, 2022, 16 (02) : 143 - 158
  • [47] Interactive two-stream graph neural network for skeleton-based action recognition
    Yang, Dun
    Zhou, Qing
    Wen, Ju
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (03)
  • [48] Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition
    Ma, Ying
    Xu, Tianpei
    Kim, Kangchul
    SENSORS, 2022, 22 (16)
  • [49] Dynamic Spatial-temporal Hypergraph Convolutional Network for Skeleton-based Action Recognition
    Wang, Shengqin
    Zhang, Yongji
    Qi, Hong
    Zhao, Minghao
    Jiang, Yu
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2147 - 2152
  • [50] Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition
    Fang, Zheng
    Zhang, Xiongwei
    Cao, Tieyong
    Zheng, Yunfei
    Sun, Meng
    IET COMPUTER VISION, 2022, 16 (03) : 205 - 217