FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

被引:0
|
作者
Yu, Bin [1 ]
Hou, Yonghong [2 ]
Guo, Zihui [3 ]
Gao, Zhiyi [2 ]
Li, Yueyang [2 ]
机构
[1] Tianjin Univ, Tianjin Int Engn Inst, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Elect Automat & Informat Engn, Tianjin 300072, Peoples R China
[3] Tianjin Chengjian Univ, Sch Comp & Informat Engn, Tianjin 300384, Peoples R China
关键词
Few-shot action recognition; Distance metric; Temporal alignment; Contrastive objectives;
D O I
10.1016/j.imavis.2024.105159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most current few-shot action recognition approaches follow the metric learning paradigm, measuring the distance of any sub-sequences (frames, any frame combinations or clips) between different actions for classification. However, this disordered distance metric between action sub-sequences ignores the long-term temporal relations of actions, which may result in significant metric deviations. What's more, the distance metric suffers from the distinctive temporal distribution of different actions, including intra-class temporal offsets and inter-class local similarity. In this paper, a novel few-shot action recognition framework, Frame-to-frame Temporal Alignment Network (FTAN), is proposed to address the above challenges. Specifically, an attention-based temporal alignment (ATA) module is devised to calculate the distance between corresponding frames of different actions along the temporal dimension to achieve frame-to-frame temporal alignment. Meanwhile, the Temporal Context module (TCM) is proposed to increase inter-class diversity by enriching the frame-level feature representation, and the Frames Cyclic Shift Module (FCSM) performs frame-level temporal cyclic shift to reduce intra-class inconsistency. In addition, we present temporal and global contrastive objectives to assist in learning discriminative and class-agnostic visual features. Experimental results show that the proposed architecture achieves state-of-the-art on HMDB51, UCF101, Something-Something V2 and Kinetics-100 datasets.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Elastic temporal alignment for few-shot action recognition
    Pan, Fei
    Xu, Chunlei
    Zhang, Hongjie
    Guo, Jie
    Guo, Yanwen
    IET COMPUTER VISION, 2023, 17 (01) : 39 - 50
  • [2] Cross-Modal Contrastive Learning Network for Few-Shot Action Recognition
    Wang, Xiao
    Yan, Yan
    Hu, Hai-Miao
    Li, Bo
    Wang, Hanzi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1257 - 1271
  • [3] VISUAL TEMPO CONTRASTIVE LEARNING FOR FEW-SHOT ACTION RECOGNITION
    Wang, Guangge
    Ye, Weirong
    Wang, Xiao
    Jin, Rongrong
    Wang, Hanzi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1096 - 1100
  • [4] Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 297 - 313
  • [5] Motion-modulated Temporal Fragment Alignment Network For Few-Shot Action Recognition
    Wu, Jiamin
    Zhang, Tianzhu
    Zhang, Zhe
    Wu, Feng
    Zhang, Yongdong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9141 - 9150
  • [6] Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
    Cao, Yichao
    Su, Xiu
    Tang, Qingfei
    You, Shan
    Lu, Xiaobo
    Xu, Chang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Few-shot action recognition with implicit temporal alignment and pair similarity optimization
    Cao, Congqi
    Li, Yajuan
    Lv, Qinyi
    Wang, Peng
    Zhang, Yanning
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 210
  • [8] Supervised Contrastive Learning for Few-Shot Action Classification
    Han, Hongfeng
    Fei, Nanyi
    Lu, Zhiwu
    Wen, Ji-Rong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 512 - 528
  • [9] TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition
    Ben-Ari, Rami
    Nacson, Mor Shpigel
    Azulai, Ophir
    Barzelay, Udi
    Rotman, Daniel
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2780 - 2788
  • [10] Few-shot learning for frame-wise phoneme recognition: Adaptation of matching networks
    Banerjee, Tirthankar
    Thurlapati, Narasimha Rao
    Pavithra, V
    Mahalakshmi, S.
    Eledath, Dhanya
    Ramasubramanian, V
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 516 - 520