FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

被引:0
|
作者
Yu, Bin [1 ]
Hou, Yonghong [2 ]
Guo, Zihui [3 ]
Gao, Zhiyi [2 ]
Li, Yueyang [2 ]
机构
[1] Tianjin Univ, Tianjin Int Engn Inst, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Elect Automat & Informat Engn, Tianjin 300072, Peoples R China
[3] Tianjin Chengjian Univ, Sch Comp & Informat Engn, Tianjin 300384, Peoples R China
关键词
Few-shot action recognition; Distance metric; Temporal alignment; Contrastive objectives;
D O I
10.1016/j.imavis.2024.105159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most current few-shot action recognition approaches follow the metric learning paradigm, measuring the distance of any sub-sequences (frames, any frame combinations or clips) between different actions for classification. However, this disordered distance metric between action sub-sequences ignores the long-term temporal relations of actions, which may result in significant metric deviations. What's more, the distance metric suffers from the distinctive temporal distribution of different actions, including intra-class temporal offsets and inter-class local similarity. In this paper, a novel few-shot action recognition framework, Frame-to-frame Temporal Alignment Network (FTAN), is proposed to address the above challenges. Specifically, an attention-based temporal alignment (ATA) module is devised to calculate the distance between corresponding frames of different actions along the temporal dimension to achieve frame-to-frame temporal alignment. Meanwhile, the Temporal Context module (TCM) is proposed to increase inter-class diversity by enriching the frame-level feature representation, and the Frames Cyclic Shift Module (FCSM) performs frame-level temporal cyclic shift to reduce intra-class inconsistency. In addition, we present temporal and global contrastive objectives to assist in learning discriminative and class-agnostic visual features. Experimental results show that the proposed architecture achieves state-of-the-art on HMDB51, UCF101, Something-Something V2 and Kinetics-100 datasets.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] CONTRASTIVE REPRESENTATION FOR FEW-SHOT VEHICLE FOOTPRINT RECOGNITION
    Wang, Yongxiong
    Hu, Chuanfei
    Wang, Guangpeng
    Lin, Xu
    2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [32] FedFSLAR: A Federated Learning Framework for Few-shot Action Recognition
    Nguyen Anh Tu
    Abu, Assanali
    Aikyn, Nartay
    Makhanov, Nursultan
    Lee, Min-Ho
    Khiem Le-Huy
    Wong, Kok-Seng
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 270 - 279
  • [33] TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition
    Li, Shuyuan
    Liu, Huabin
    Qian, Rui
    Li, Yuxi
    See, John
    Fei, Mengjuan
    Yu, Xiaoyuan
    Lin, Weiyao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1404 - 1411
  • [34] Frame-Level Embedding Learning for Few-shot Bioacoustic Event Detection
    Zhang, Xueyang
    Wang, Shuxian
    Du, Jun
    Yan, Genwei
    Tang, Jigang
    Gao, Tian
    Fang, Xin
    Pan, Jia
    Gao, Jianqing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 750 - 755
  • [35] Dynamic Temporal Shift Feature Enhancement for Few-Shot Action Recognition
    Li, Haibo
    Zhang, Bingbing
    Ma, Yuanchen
    Guo, Qiang
    Zhang, Jianxin
    Zhang, Qiang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 471 - 484
  • [36] Spatio-temporal Relation Modeling for Few-shot Action Recognition
    Thatipelli, Anirudh
    Narayan, Sanath
    Khan, Salman
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Ghanem, Bernard
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19926 - 19935
  • [37] Few-shot learning for ear recognition
    Zhang, Jie
    Yu, Wen
    Yang, Xudong
    Deng, Fang
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 50 - 54
  • [38] Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning
    Wang, Jiahao
    Wang, Yunhong
    Liu, Sheng
    Li, Annan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 582 - 591
  • [39] Contrastive prototype loss based discriminative feature network for few-shot learning
    Yan, Leilei
    He, Feihong
    Zheng, Xiaohan
    Zhang, Li
    Zhang, Yiqi
    He, Jiangzhen
    Du, Weidong
    Wang, Yansong
    Li, Fanzhang
    APPLIED INTELLIGENCE, 2025, 55 (05)
  • [40] Multimodal variational contrastive learning for few-shot classification
    Pan, Meihong
    Shen, Hongbin
    APPLIED INTELLIGENCE, 2024, 54 (02) : 1879 - 1892