EVA: Enabling Video Attributes With Hierarchical Prompt Tuning for Action Recognition

被引:0
|
作者
Ruan, Xiangning [1 ]
Yin, Qixiang [1 ]
Su, Fei [1 ]
Zhao, Zhicheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
关键词
Feature extraction; Transformers; Visualization; Tuning; Adaptation models; Streaming media; Semantics; Computational modeling; Accuracy; Dictionaries; Parameter efficient transfer learning; prompt-based learning; action recognition; transformer;
D O I
10.1109/LSP.2025.3533307
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The pretraining and fine-tuning paradigm has excelled in action recognition. However, full fine-tuning is computationally and storage costly, while parameter-efficient fine-tuning (PEFT) always sacrifices accuracy and stability. To address these challenges, we propose a novel method, Enabling Video Attributes with Hierarchical Prompt Tuning (EVA), to guide action recognition. Firstly, instead of focusing solely on temporal features, EVA sparsely extracts six types of video attributes across two modalities, capturing the relatively gradual attribute changes in actions. Secondly, a hierarchical prompt tuning architecture with multiscale attribute prompts is introduced to learn the differences in actions. Finally, by adjusting only a small number of additional parameters, EVA outperforms all PEFT and most full fine-tuning methods across four widely used datasets (Something-Something V2, ActivityNet, HMDB51, and UCF101), demonstrating its effectiveness.
引用
收藏
页码:971 / 975
页数:5
相关论文
共 50 条
  • [1] Action-guided prompt tuning for video grounding
    Wang, Jing
    Tsao, Raymon
    Wang, Xuan
    Wang, Xiaojie
    Feng, Fangxiang
    Tian, Shiyu
    Poria, Soujanya
    INFORMATION FUSION, 2025, 113
  • [2] Learning hierarchical video representation for action recognition
    Li Q.
    Qiu Z.
    Yao T.
    Mei T.
    Rui Y.
    Luo J.
    International Journal of Multimedia Information Retrieval, 2017, 6 (1) : 85 - 98
  • [3] Compressed Video Prompt Tuning
    Li, Bing
    Chen, Jiaxin
    Bao, Xiuguo
    Huang, Di
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
    Chung, Jihoon
    Wu, Yu
    Russakovsky, Olga
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Prompt Tuning with Contradictory Intentions for Sarcasm Recognition
    Liu, Yiyi
    Zhang, Ruqing
    Fan, Yixing
    Guo, Jiafeng
    Cheng, Xueqi
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 328 - 339
  • [6] CoHOZ: Contrastive Multimodal Prompt Tuning for Hierarchical Open-set Zero-shot Recognition
    Liao, Ning
    Liu, Yifeng
    Li, Xiaobo
    Lei, Chenyi
    Wang, Guoxin
    Hua, Xian-Sheng
    Yan, Junchi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3262 - 3271
  • [7] Recognition of Complex Objects by Hierarchical Attributes
    Zhang, Wenqiang
    Li, Wensheng
    Dong, Shuai
    Li, Yueqiao
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC), 2018,
  • [8] A Knowledge-Based Hierarchical Causal Inference Network for Video Action Recognition
    Liu, Yang
    Liu, Fang
    Jiao, Licheng
    Bao, Qianyue
    Li, Lingling
    Guo, Yuwei
    Chen, Puhua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9135 - 9149
  • [9] Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Action Recognition
    Bandara, Wele Gedara Chaminda
    Patel, Vishal M.
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [10] COPHTC: CONTRASTIVE LEARNING WITH PROMPT TUNING FOR HIERARCHICAL TEXT CLASSIFICATION
    Cai, Fuhan
    Zhang, Zhongqiang
    Liu, Duo
    Fang, Xiangzhong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5400 - 5404