Semantic-guided spatio-temporal attention for few-shot action recognition

被引:1
|
作者
Wang, Jianyu [1 ]
Liu, Baolin [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot action recognition; Semantic-guided attention mechanism; Multimodal learning; Sequence matching; NETWORKS;
D O I
10.1007/s10489-024-05294-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot action recognition is a challenging problem aimed at learning a model capable of adapting to recognize new categories using only a few labeled videos. Recently, some works use attention mechanisms to focus on relevant regions to obtain discriminative representations. Despite the significant progress, these methods still cannot attain outstanding performance due to insufficient examples and a scarcity of additional supplementary information. In this paper, we propose a novel Semantic-guided Spatio-temporal Attention (SGSTA) approach for few-shot action recognition. The main idea of SGSTA is to exploit the semantic information contained in the text embedding of labels to guide attention to more accurately capture the rich spatio-temporal context in videos when visual content is insufficient. Specifically, SGSTA comprises two essential components: a visual-text alignment module and a semantic-guided spatio-temporal attention module. The former is used to align visual features and text embeddings to eliminate semantic gaps between them. The latter is further divided into spatial attention and temporal attention. Firstly, a semantic-guided spatial attention is applied on the frame feature map to focus on semantically relevant spatial regions. Then, a semantic-guided temporal attention is used to encode the semantically enhanced temporal context with a temporal Transformer. Finally, use the spatio-temporally contextual representation obtained to learn relationship matching between support and query sequences. In this way, SGSTA can fully utilize rich semantic priors in label embeddings to improve class-specific discriminability and achieve accurate few-shot recognition. Comprehensive experiments on four challenging benchmarks demonstrate that the proposed SGSTA is effective and achieves competitive performance over existing state-of-the-art methods under various settings.
引用
收藏
页码:2458 / 2471
页数:14
相关论文
共 50 条
  • [1] Semantic-guided spatio-temporal attention for few-shot action recognition
    Jianyu Wang
    Baolin Liu
    Applied Intelligence, 2024, 54 : 2458 - 2471
  • [2] Semantic-Guided Relation Propagation Network for Few-shot Action Recognition
    Wang, Xiao
    Ye, Weirong
    Qi, Zhongang
    Zhao, Xun
    Wang, Guangge
    Shan, Ying
    Wang, Hanzi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 816 - 825
  • [3] Spatio-temporal Relation Modeling for Few-shot Action Recognition
    Thatipelli, Anirudh
    Narayan, Sanath
    Khan, Salman
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Ghanem, Bernard
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19926 - 19935
  • [4] Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
    Cao, Yichao
    Su, Xiu
    Tang, Qingfei
    You, Shan
    Lu, Xiaobo
    Xu, Chang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Spatio-Temporal Self-supervision for Few-Shot Action Recognition
    Yu, Wanchuan
    Guo, Hanyu
    Yan, Yan
    Li, Jie
    Wang, Hanzi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 84 - 96
  • [6] Cross-modal guides spatio-temporal enrichment network for few-shot action recognition
    Chen, Zhiwen
    Yang, Yi
    Li, Li
    Li, Min
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11196 - 11211
  • [7] Few-Shot Human-Object Interaction Recognition With Semantic-Guided Attentive Prototypes Network
    Ji, Zhong
    Liu, Xiyao
    Pang, Yanwei
    Ouyang, Wangli
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (1648-1661) : 1648 - 1661
  • [8] Relational Action Bank with Semantic-Visual Attention for Few-Shot Action Recognition
    Liang, Haoming
    Du, Jinze
    Zhang, Hongchen
    Han, Bing
    Ma, Yan
    FUTURE INTERNET, 2023, 15 (03)
  • [9] Elastic temporal alignment for few-shot action recognition
    Pan, Fei
    Xu, Chunlei
    Zhang, Hongjie
    Guo, Jie
    Guo, Yanwen
    IET COMPUTER VISION, 2023, 17 (01) : 39 - 50
  • [10] Semantic-Guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift
    Xiao, Kangyu
    Wang, Zilei
    Li, Junjie
    COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 303 - 320