Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning

被引：15

作者：

Zheng, Sipeng ^{[1
]}

Chen, Shizhe ^{[2
]}

Jin, Qin ^{[1
]}

机构：

[1] Renmin Univ China, Beijing, Peoples R China

[2] INRIA, Paris, France

来源：

COMPUTER VISION - ECCV 2022, PT IV | 2022年 / 13664卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Few-shot learning; Action recognition; Contrastive learning;

D O I：

10.1007/978-3-031-19772-7_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot action recognition aims to recognize actions in test videos based on limited annotated data of target action classes. The dominant approaches project videos into a metric space and classify videos via nearest neighboring. They mainly measure video similarities using global or temporal alignment alone, while an optimum matching should be multi-level. However, the complexity of learning coarse-to-fine matching quickly rises as we focus on finer-grained visual cues, and the lack of detailed local supervision is another challenge. In this work, we propose a hierarchical matching model to support comprehensive similarity measure at global, temporal and spatial levels via a zoom-in matching module. We further propose a mixed-supervised hierarchical contrastive learning (HCL), which not only employs supervised contrastive learning to differentiate videos at different levels, but also utilizes cycle consistency as weak supervision to align discriminative temporal clips or spatial patches. Our model achieves state-of-the-art performance on four benchmarks especially under the most challenging 1-shot recognition setting.

引用

页码：297 / 313

页数：17

共 50 条

[31] Attentive matching network for few-shot learning
Mai, Sijie
Hu, Haifeng
Xu, Jia
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 187
[32] A statistical framework for few-shot action recognition
Mark Haddad
Vahid K. Ghassab
Fatma Najar
Nizar Bouguila
Multimedia Tools and Applications, 2021, 80 : 24303 - 24318
[33] A statistical framework for few-shot action recognition
Haddad, Mark
Ghassab, Vahid K.
Najar, Fatma
Bouguila, Nizar
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24303 - 24318
[34] A Contrastive learning-based Task Adaptation model for few-shot intent recognition
Zhang, Xin
Cai, Fei
Hu, Xuejun
Zheng, Jianming
Chen, Honghui
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (03)
[35] Learning to focus: cascaded feature matching network for few-shot image recognition
Mengting CHEN
Xinggang WANG
Heng LUO
Yifeng GENG
Wenyu LIU
Science China(Information Sciences), 2021, 64 (09) : 90 - 102
[36] Learning to focus: cascaded feature matching network for few-shot image recognition
Mengting Chen
Xinggang Wang
Heng Luo
Yifeng Geng
Wenyu Liu
Science China Information Sciences, 2021, 64
[37] Learning to focus: cascaded feature matching network for few-shot image recognition
Chen, Mengting
Wang, Xinggang
Luo, Heng
Geng, Yifeng
Liu, Wenyu
SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (09)
[38] Mask Mixup Model: Enhanced Contrastive Learning for Few-Shot Learning
Xie, Kai
Gao, Yuxuan
Chen, Yadang
Che, Xun
APPLIED SCIENCES-BASEL, 2024, 14 (14):
[39] Multi-Speed Global Contextual Subspace Matching for Few-Shot Action Recognition
Yu, Tianwei
Chen, Peng
Dang, Yuanjie
Huan, Ruohong
Liang, Ronghua
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2344 - 2352
[40] Learning Compositional Representations for Few-Shot Recognition
Tokmakov, Pavel
Wang, Yu-Xiong
Hebert, Martial
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6381 - 6390

← 1 2 3 4 5 →