Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

被引:10
|
作者
Chen, Tiansheng [1 ]
Mo, Lingfei [1 ]
机构
[1] Southeast Univ, Sch Instrument Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Swin-Transformer; Feature pyramid; Image classification; NETWORK;
D O I
10.1007/s11063-023-11367-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humanaction recognition based on still images is one of themost challenging computer vision tasks. In the past decade, convolutional neural networks (CNNs) have developed rapidly and achieved good performance in human action recognition tasks based on still images. Due to the absence of the remote perception ability of CNNs, it is challenging to have a global structural understanding of human behavior and the overall relationship between the behavior and the environment. Recently, transformer-based models have been making a splash in computer vision, even reaching SOTA in several vision tasks. We explore the transformer's capability in human action recognition based on still images and add a simple but effective feature fusion module based on the Swin-Transformer model. More specifically, we propose a newtransformer-basedmodel for behavioral feature extraction that uses a pre-trained SwinTransformer as the backbone network. Swin-Transformer's distinctive hierarchical structure, combined with the feature fusion module, is used to extract and fuse multi-scale behavioral information. Extensive experiments were conducted on five still image-based human action recognition datasets, including the Li's action dataset, the Stanford-40 dataset, the PPMI-24 dataset, the AUC-V1 dataset, and the AUC-V2 dataset. Results indicate that our proposed Swin-Fusion model achieves better behavior recognition than previously improved CNNbased models by sharing and reusing feature maps of different scales at multiple stages, without modifying the original backbone training method and with only increasing training resources by 1.6%. The code and models will be available at https://github.com/ cts4444/ Swin-Fusion.
引用
收藏
页码:11109 / 11130
页数:22
相关论文
共 50 条
  • [21] Multimodal Fusion-based Swin Transformer for Facial Recognition Micro-Expression Recognition
    Zhao, Xinhua
    Lv, Yongjia
    Huang, Zheng
    PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2022), 2022, : 780 - 785
  • [22] Improved target tracking algorithm based on Swin-Transformer
    Liu, Shi
    Zhu, Ming
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (11) : 1569 - 1580
  • [23] SwinSOD: Salient object detection using swin-transformer
    Wu, Shuang
    Zhang, Guangjian
    Liu, Xuefeng
    IMAGE AND VISION COMPUTING, 2024, 146
  • [24] Swin-FER: Swin Transformer for Facial Expression Recognition
    Bie, Mei
    Xu, Huan
    Gao, Yan
    Song, Kai
    Che, Xiangjiu
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [25] Memorizing Swin-Transformer Denoising Network for Diffusion Model
    Chen, Jindou
    Shen, Yiqing
    ELECTRONICS, 2024, 13 (20)
  • [26] Learning Transferable Feature Representation with Swin Transformer for Object Recognition
    Ren, Jian-Xin
    Xiong, Yu-Jie
    Xie, Xi-Jiong
    Dai, Yu-Fan
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2211 - 2223
  • [27] HYPERSPECTRAL AND MULTISPECTRAL IMAGES FUSION BASED ON PYRAMID SWIN TRANSFORMER
    Lang, Han
    Bao, Wenxing
    Feng, Wei
    Sun, Shasha
    Ma, Xuan
    Zhang, Xiaowu
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 3125 - 3128
  • [28] Hyperspectral and multispectral images fusion based on pyramid swin transformer
    Lang, Han
    Bao, Wenxing
    Feng, Wei
    Qu, Kewen
    Ma, Xuan
    Zhang, Xiaowu
    INFRARED PHYSICS & TECHNOLOGY, 2024, 143
  • [29] Ancient Chinese Character Recognition with Improved Swin-Transformer and Flexible Data Enhancement Strategies
    Zheng, Yi
    Chen, Yi
    Wang, Xianbo
    Qi, Donglian
    Yan, Yunfeng
    SENSORS, 2024, 24 (07)
  • [30] End-to-end Image Compression with Swin-Transformer
    Wang, Meng
    Zhang, Kai
    Zhang, Li
    Li, Yue
    Li, Junru
    Wang, Yue
    Wang, Shiqi
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,