FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

被引:0
|
作者
Yu, Bin [1 ]
Hou, Yonghong [2 ]
Guo, Zihui [3 ]
Gao, Zhiyi [2 ]
Li, Yueyang [2 ]
机构
[1] Tianjin Univ, Tianjin Int Engn Inst, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Elect Automat & Informat Engn, Tianjin 300072, Peoples R China
[3] Tianjin Chengjian Univ, Sch Comp & Informat Engn, Tianjin 300384, Peoples R China
关键词
Few-shot action recognition; Distance metric; Temporal alignment; Contrastive objectives;
D O I
10.1016/j.imavis.2024.105159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most current few-shot action recognition approaches follow the metric learning paradigm, measuring the distance of any sub-sequences (frames, any frame combinations or clips) between different actions for classification. However, this disordered distance metric between action sub-sequences ignores the long-term temporal relations of actions, which may result in significant metric deviations. What's more, the distance metric suffers from the distinctive temporal distribution of different actions, including intra-class temporal offsets and inter-class local similarity. In this paper, a novel few-shot action recognition framework, Frame-to-frame Temporal Alignment Network (FTAN), is proposed to address the above challenges. Specifically, an attention-based temporal alignment (ATA) module is devised to calculate the distance between corresponding frames of different actions along the temporal dimension to achieve frame-to-frame temporal alignment. Meanwhile, the Temporal Context module (TCM) is proposed to increase inter-class diversity by enriching the frame-level feature representation, and the Frames Cyclic Shift Module (FCSM) performs frame-level temporal cyclic shift to reduce intra-class inconsistency. In addition, we present temporal and global contrastive objectives to assist in learning discriminative and class-agnostic visual features. Experimental results show that the proposed architecture achieves state-of-the-art on HMDB51, UCF101, Something-Something V2 and Kinetics-100 datasets.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Hierarchical Reasoning Network with Contrastive Learning for Few-Shot Human-Object Interaction Recognition
    Yu, Jiale
    Zhang, Baopeng
    Li, Qirui
    Chen, Haoyang
    Teng, Zhu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4260 - 4268
  • [22] Multi-level alignment for few-shot temporal action localization
    Keisham, Kanchan
    Jalali, Amin
    Kim, Jonghong
    Lee, Minho
    INFORMATION SCIENCES, 2023, 650
  • [23] Interpretable Few-Shot Learning with Contrastive Constraint
    Zhang L.
    Chen Y.
    Wu W.
    Wei B.
    Luo X.
    Chang X.
    Liu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (12): : 2573 - 2584
  • [24] Diversified Contrastive Learning For Few-Shot Classification
    Lu, Guangtong
    Li, Fanzhang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 147 - 158
  • [25] Spatial Contrastive Learning for Few-Shot Classification
    Ouali, Yassine
    Hudelot, Celine
    Tami, Myriam
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 671 - 686
  • [26] Learning a Few-shot Embedding Model with Contrastive Learning
    Liu, Chen
    Fu, Yanwei
    Xu, Chengming
    Yang, Siqian
    Li, Jilin
    Wang, Chengjie
    Zhang, Li
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8635 - 8643
  • [27] Augmenting Few-Shot Learning With Supervised Contrastive Learning
    Lee, Taemin
    Yoo, Sungjoo
    IEEE ACCESS, 2021, 9 : 61466 - 61474
  • [28] MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
    Wang, Xiang
    Zhang, Shiwei
    Qing, Zhiwu
    Gao, Changxin
    Zhang, Yingya
    Zhao, Deli
    Sang, Nong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18011 - 18021
  • [29] Hybrid attentive prototypical network for few-shot action recognition
    Ruan, Zanxi
    Wei, Yingmei
    Guo, Yanming
    Xie, Yuxiang
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 8249 - 8272
  • [30] Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition
    Cao, Kaiyan
    Peng, Jiawen
    Chen, Jiaxin
    Hou, Xinyuan
    Ma, Andy J.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 255