Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

被引:10
|
作者
Chen, Tiansheng [1 ]
Mo, Lingfei [1 ]
机构
[1] Southeast Univ, Sch Instrument Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Swin-Transformer; Feature pyramid; Image classification; NETWORK;
D O I
10.1007/s11063-023-11367-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humanaction recognition based on still images is one of themost challenging computer vision tasks. In the past decade, convolutional neural networks (CNNs) have developed rapidly and achieved good performance in human action recognition tasks based on still images. Due to the absence of the remote perception ability of CNNs, it is challenging to have a global structural understanding of human behavior and the overall relationship between the behavior and the environment. Recently, transformer-based models have been making a splash in computer vision, even reaching SOTA in several vision tasks. We explore the transformer's capability in human action recognition based on still images and add a simple but effective feature fusion module based on the Swin-Transformer model. More specifically, we propose a newtransformer-basedmodel for behavioral feature extraction that uses a pre-trained SwinTransformer as the backbone network. Swin-Transformer's distinctive hierarchical structure, combined with the feature fusion module, is used to extract and fuse multi-scale behavioral information. Extensive experiments were conducted on five still image-based human action recognition datasets, including the Li's action dataset, the Stanford-40 dataset, the PPMI-24 dataset, the AUC-V1 dataset, and the AUC-V2 dataset. Results indicate that our proposed Swin-Fusion model achieves better behavior recognition than previously improved CNNbased models by sharing and reusing feature maps of different scales at multiple stages, without modifying the original backbone training method and with only increasing training resources by 1.6%. The code and models will be available at https://github.com/ cts4444/ Swin-Fusion.
引用
收藏
页码:11109 / 11130
页数:22
相关论文
共 50 条
  • [41] Speech Keyword Spotting Method Based on Swin-Transformer Model
    Chengli Sun
    Bikang Chen
    Feilong Chen
    Yan Leng
    Qiaosheng Guo
    International Journal of Computational Intelligence Systems, 17
  • [42] Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion
    He, Wentao
    Ren, Jianfeng
    Bai, Ruibin
    Jiang, Xudong
    PATTERN RECOGNITION, 2025, 159
  • [43] 基于Swin-Transformer的短波协议信号识别
    朱政宇
    陈鹏飞
    王梓晅
    巩克现
    吴迪
    王忠勇
    通信学报 , 2022, (11) : 127 - 135
  • [44] Graph-Structured Swin-Transformer for Learned Image Compression
    Wang, Lilong
    Shi, Yunhui
    Wang, Jin
    Yin, Baocai
    Ling, Nam
    2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 592 - 592
  • [45] A Fusion Deraining Network Based on Swin Transformer and Convolutional Neural Network
    Tang, Junhao
    Feng, Guorui
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (07) : 1254 - 1257
  • [46] Remote Sensing Image Fusion Method Based on Improved Swin Transformer
    Li Zitong
    Zhao Jiankang
    Xu Jingran
    Long Haihui
    Liu Chuanqi
    ACTA PHOTONICA SINICA, 2023, 52 (11)
  • [47] SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images
    Wang, Zhishe
    Chen, Yanlin
    Shao, Wenyu
    Li, Hui
    Zhang, Lei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [48] A fusion-attention swin transformer for cardiac MRI image segmentation
    Yang, Ruiping
    Liu, Kun
    Liang, Yongquan
    IET IMAGE PROCESSING, 2024, 18 (01) : 105 - 115
  • [49] 基于Swin-Transformer的岩石自动分类识别
    俞文静
    王代涛
    黄舒怡
    黄佳伟
    高福智
    钟剑斌
    现代计算机, 2024, 30 (13) : 15 - 20
  • [50] STEF: a Swin Transformer-Based Enhanced Feature Pyramid Fusion Model for Dongba character detection
    Ma, Yuqi
    Chen, Shanxiong
    Li, Yongbo
    He, Jingliu
    Ruan, Qiuyue
    Xiao, Wenjun
    Xiong, Hailing
    Li, Xiaoliang
    HERITAGE SCIENCE, 2024, 12 (01):