Few-shot Action Recognition via Multi-view Representation Learning

被引:0
|
作者
Wang X. [1 ]
Lu Y. [1 ]
Yu W. [1 ]
Pang Y. [2 ]
Wang H. [1 ]
机构
[1] School of Informatics, Fujian Key Laboratory of Sensing and Computing for Smart City, Xiamen University, Xiamen
[2] School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, Tianjin University, Tianjin
基金
中国国家自然科学基金;
关键词
action recognition; Circuits and systems; Convolution; Few-shot learning; meta-learning; multi-view representation learning; Prototypes; Representation learning; Task analysis; Three-dimensional displays; Training;
D O I
10.1109/TCSVT.2024.3384875
中图分类号
学科分类号
摘要
Few-shot action recognition aims to recognize novel action classes with limited labeled samples and has recently received increasing attention. The core objective of few-shot action recognition is to enhance the discriminability of feature representations. In this paper, we propose a novel multi-view representation learning network (MRLN) to model intra-video and inter-video relations for few-shot action recognition. Specifically, we first propose a spatial-aware aggregation refinement module (SARM), which mainly consists of a spatial-aware aggregation sub-module and a spatial-aware refinement sub-module to explore the spatial context of samples at the frame level. Then, we design a temporal-channel enhancement module (TCEM), which can capture the temporal-aware and channel-aware features of samples with the elaborately designed temporal-aware enhancement sub-module and channel-aware enhancement sub-module. Third, we introduce a cross-video relation module (CVRM), which can explore the relations across videos by utilizing the self-attention mechanism. Moreover, we design a prototype-centered mean absolute error loss to improve the feature learning capability of the proposed MRLN. Extensive experiments on four prevalent few-shot action recognition benchmarks show that the proposed MRLN can significantly outperform a variety of state-of-the-art few-shot action recognition methods. Especially, on the 5-way 1-shot setting, our MRLN respectively achieves 75.7%, 86.9%, 65.5% and 45.9% on the Kinetics, UCF101, HMDB51 and SSv2 datasets. IEEE
引用
收藏
页码:1 / 1
相关论文
共 50 条
  • [11] Few-shot Low-resource Knowledge Graph Completion with Multi-view Task Representation Generation
    Pei, Shichao
    Kou, Ziyi
    Zhang, Qiannan
    Zhang, Xiangliang
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1862 - 1871
  • [12] M3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition
    Tang, Hao
    Liu, Jun
    Yan, Shuanglin
    Yan, Rui
    Li, Zechao
    Tang, Jinhui
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1719 - 1728
  • [13] FedFSLAR: A Federated Learning Framework for Few-shot Action Recognition
    Nguyen Anh Tu
    Abu, Assanali
    Aikyn, Nartay
    Makhanov, Nursultan
    Lee, Min-Ho
    Khiem Le-Huy
    Wong, Kok-Seng
    [J]. 2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 270 - 279
  • [14] VISUAL TEMPO CONTRASTIVE LEARNING FOR FEW-SHOT ACTION RECOGNITION
    Wang, Guangge
    Ye, Weirong
    Wang, Xiao
    Jin, Rongrong
    Wang, Hanzi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1096 - 1100
  • [15] Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 297 - 313
  • [16] Few-shot learning for ear recognition
    Zhang, Jie
    Yu, Wen
    Yang, Xudong
    Deng, Fang
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 50 - 54
  • [17] MULTI-TASK REPRESENTATION LEARNING NETWORK FOR FEW-SHOT SAR AUTOMATIC TARGET RECOGNITION
    Wang, Xi
    Yu, Xuelian
    Ren, Haohao
    Zhou, Yun
    Zou, Lin
    Wang, Xuegang
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2618 - 2621
  • [18] Invariant and consistent: Unsupervised representation learning for few-shot visual recognition
    Wu, Heng
    Zhao, Yifan
    Li, Jia
    [J]. NEUROCOMPUTING, 2023, 520 : 1 - 14
  • [19] Convolutional Siamese neural network for few-shot multi-view face identification
    Meddad, Majdouline
    Moujahdi, Chouaib
    Mikram, Mounia
    Rziza, Mohammed
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 3135 - 3144
  • [20] Multi-view semantic enhancement model for few-shot knowledge graph completion
    Ma, Ruixin
    Wu, Hao
    Wang, Xiaoru
    Wang, Weihe
    Ma, Yunlong
    Zhao, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238