Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

被引:10
|
作者
Chen, Tiansheng [1 ]
Mo, Lingfei [1 ]
机构
[1] Southeast Univ, Sch Instrument Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Swin-Transformer; Feature pyramid; Image classification; NETWORK;
D O I
10.1007/s11063-023-11367-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humanaction recognition based on still images is one of themost challenging computer vision tasks. In the past decade, convolutional neural networks (CNNs) have developed rapidly and achieved good performance in human action recognition tasks based on still images. Due to the absence of the remote perception ability of CNNs, it is challenging to have a global structural understanding of human behavior and the overall relationship between the behavior and the environment. Recently, transformer-based models have been making a splash in computer vision, even reaching SOTA in several vision tasks. We explore the transformer's capability in human action recognition based on still images and add a simple but effective feature fusion module based on the Swin-Transformer model. More specifically, we propose a newtransformer-basedmodel for behavioral feature extraction that uses a pre-trained SwinTransformer as the backbone network. Swin-Transformer's distinctive hierarchical structure, combined with the feature fusion module, is used to extract and fuse multi-scale behavioral information. Extensive experiments were conducted on five still image-based human action recognition datasets, including the Li's action dataset, the Stanford-40 dataset, the PPMI-24 dataset, the AUC-V1 dataset, and the AUC-V2 dataset. Results indicate that our proposed Swin-Fusion model achieves better behavior recognition than previously improved CNNbased models by sharing and reusing feature maps of different scales at multiple stages, without modifying the original backbone training method and with only increasing training resources by 1.6%. The code and models will be available at https://github.com/ cts4444/ Swin-Fusion.
引用
收藏
页码:11109 / 11130
页数:22
相关论文
共 50 条
  • [31] Learning Transferable Feature Representation with Swin Transformer for Object Recognition
    Jian-Xin Ren
    Yu-Jie Xiong
    Xi-Jiong Xie
    Yu-Fan Dai
    Neural Processing Letters, 2023, 55 : 2211 - 2223
  • [32] SwinSTFM: Remote Sensing Spatiotemporal Fusion Using Swin Transformer
    Chen, Guanyu
    Jiao, Peng
    Hu, Qing
    Xiao, Linjie
    Ye, Zijian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [33] Pre-Locator Incorporating Swin-Transformer Refined Classifier for Traffic Sign Recognition
    Luo, Qiang
    Zheng, Wenbin
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02): : 2227 - 2246
  • [34] MULTI SCALE SAR AIRCRAFT DETECTION BASED ON SWIN TRANSFORMER AND ADAPTIVE FEATURE FUSION NETWORK
    Ye, Chengjie
    Tian, Jinwen
    Tian, Tian
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7058 - 7061
  • [35] SEMANTIC SEGMENTATION FOR REMOTE SENSING IMAGES BASED ON SWIN-TRANSFORMER AND MULTISCALE FEATURE REFINEMENT
    Zhu, Shengyu
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6370 - 6373
  • [36] PolySegNet: improving polyp segmentation through swin transformer and vision transformer fusion
    Lijin, P.
    Ullah, Mohib
    Vats, Anuja
    Cheikh, Faouzi Alaya
    Kumar, G. Santhosh
    Nair, Madhu S.
    BIOMEDICAL ENGINEERING LETTERS, 2024, 14 (06) : 1421 - 1431
  • [37] 基于Swin-Transformer改进的目标跟踪算法
    刘时
    朱明
    液晶与显示, 2024, 39 (11) : 1569 - 1580
  • [38] 基于Swin-Transformer的磁瓦缺陷检测
    陈荣演
    邱天
    杨创富
    张昕
    陈宇琪
    符晓
    宁洪龙
    现代计算机, 2023, 29 (09) : 68 - 73
  • [39] Swin-FlowNet: Flow field oriented optimization aided by a CNN and Swin-Transformer based model
    Wang, Xiao
    Zou, Shufan
    Jiang, Yi
    Zhang, Laiping
    Deng, Xiaogang
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 72
  • [40] Smaller and more Accurate Swin-transformer Model Prediction for Tracking
    Pan, Fei
    Zhao, Lianyu
    Wang, Chenglin
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 955 - 961