Text-guided Graph Temporal Modeling for few-shot video classification

被引:0
|
作者
Deng, Fuqin [1 ,6 ,7 ]
Zhong, Jiaming [1 ,3 ]
Li, Nannan [2 ]
Fu, Lanhui [1 ]
Jiang, Bingchun [3 ]
Yi, Ningbo [5 ]
Qi, Feng [4 ]
Xin, He [4 ]
Lam, Tin Lun [7 ]
机构
[1] Wuyi Univ, Sch Elect & Informat Engn, Jiangmen, Peoples R China
[2] Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau, Peoples R China
[3] Guangdong Univ Sci & Technol, Sch Mech & Elect Engn, Dongguan, Peoples R China
[4] Wuyi Univ, Sch Appl Phys & Mat Sci, Jiangmen, Peoples R China
[5] Wuyi Univ, Sch Text Mat & Engn, Jiangmen, Peoples R China
[6] Shenzhen Vatop Semicon Tech Co Ltd, Shenzhen, Peoples R China
[7] Chinese Univ Hong Kong, Shenzhen Inst Artificial Intelligence & Robot Soc, Sch Sci & Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot video classification; Multi-modal learning; Large model application; Graph Temporal Network;
D O I
10.1016/j.engappai.2024.109076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale pre-trained models and graph neural networks have recently demonstrated remarkable success in few-shot video classification tasks. However, they generally suffer from two key limitations: i) the temporal relations between adjacent frames tends to be ambiguous due to the lack of explicit temporal modeling. ii) the absence of multi-modal semantic knowledge in query videos results in inaccurate prototypes construction and an inability to achieve multi-modal temporal alignment metrics. To address these issues, we develop a Text- guided Graph Temporal Modeling (TgGTM) method that consists of two crucial components: a text-guided feature refinement module and a learnable Query text-token contrastive objective. Specifically, the former leverages the Temporal masking layer to guide the model in learning temporal relationships between adjacent frames. Additionally, it utilizes multi-modal information to refine video prototypes for comprehensive few- shot video classification. The latter addresses the feature discrepancy between multi-modal support features and single-modal query features by aligning a learnable Query text-token with corresponding base class text descriptions. Extensive experiments on four commonly used benchmarks demonstrate the effectiveness of our proposed method, which achieves mean accuracies of 54.4%, 80.3%, 91.9%, and 96.2% for 5-way 1shot classification on SSV2-Small, HMDB51, Kinetics, and UCF101, respectively. These results are superior compared to existing state-of-the-art methods. A detailed ablation showcases the importance of learning temporal relationships between adjacent frames and obtaining Query text-token. The source code and models will be publicly available at https://github.com/JiaMingZhong2621/TgGTM.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Mask-guided BERT for few-shot text classification
    Liao, Wenxiong
    Liu, Zhengliang
    Dai, Haixing
    Wu, Zihao
    Zhang, Yiyang
    Huang, Xiaoke
    Chen, Yuzhong
    Jiang, Xi
    Liu, David
    Zhu, Dajiang
    Li, Sheng
    Liu, Wei
    Liu, Tianming
    Li, Quanzheng
    Cai, Hongmin
    Li, Xiang
    NEUROCOMPUTING, 2024, 610
  • [2] Knowledge Guided Metric Learning for Few-Shot Text Classification
    Sui, Dianbo
    Chen, Yubo
    Mao, Binjie
    Qiu, Delai
    Liu, Kang
    Zhao, Jun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3266 - 3271
  • [3] Learning Implicit Temporal Alignment for Few-shot Video Classification
    Zhang, Songyang
    Zhou, Jiale
    He, Xuming
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1309 - 1315
  • [4] Knowledge-Guided Prompt Learning for Few-Shot Text Classification
    Wang, Liangguo
    Chen, Ruoyu
    Li, Li
    ELECTRONICS, 2023, 12 (06)
  • [5] Causal representation for few-shot text classification
    Yang, Maoqin
    Zhang, Xuejie
    Wang, Jin
    Zhou, Xiaobing
    APPLIED INTELLIGENCE, 2023, 53 (18) : 21422 - 21432
  • [6] Few-shot learning for short text classification
    Yan, Leiming
    Zheng, Yuhui
    Cao, Jie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29799 - 29810
  • [7] Adversarial training for few-shot text classification
    Croce, Danilo
    Castellucci, Giuseppe
    Basili, Roberto
    INTELLIGENZA ARTIFICIALE, 2020, 14 (02) : 201 - 214
  • [8] Few-shot learning for short text classification
    Leiming Yan
    Yuhui Zheng
    Jie Cao
    Multimedia Tools and Applications, 2018, 77 : 29799 - 29810
  • [9] Causal representation for few-shot text classification
    Maoqin Yang
    Xuejie Zhang
    Jin Wang
    Xiaobing Zhou
    Applied Intelligence, 2023, 53 : 21422 - 21432
  • [10] Continual Few-Shot Learning for Text Classification
    Pasunuru, Ramakanth
    Stoyanov, Veselin
    Bansal, Mohit
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5688 - 5702