EVA: Enabling Video Attributes With Hierarchical Prompt Tuning for Action Recognition

被引:0
|
作者
Ruan, Xiangning [1 ]
Yin, Qixiang [1 ]
Su, Fei [1 ]
Zhao, Zhicheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
关键词
Feature extraction; Transformers; Visualization; Tuning; Adaptation models; Streaming media; Semantics; Computational modeling; Accuracy; Dictionaries; Parameter efficient transfer learning; prompt-based learning; action recognition; transformer;
D O I
10.1109/LSP.2025.3533307
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The pretraining and fine-tuning paradigm has excelled in action recognition. However, full fine-tuning is computationally and storage costly, while parameter-efficient fine-tuning (PEFT) always sacrifices accuracy and stability. To address these challenges, we propose a novel method, Enabling Video Attributes with Hierarchical Prompt Tuning (EVA), to guide action recognition. Firstly, instead of focusing solely on temporal features, EVA sparsely extracts six types of video attributes across two modalities, capturing the relatively gradual attribute changes in actions. Secondly, a hierarchical prompt tuning architecture with multiscale attribute prompts is introduced to learn the differences in actions. Finally, by adjusting only a small number of additional parameters, EVA outperforms all PEFT and most full fine-tuning methods across four widely used datasets (Something-Something V2, ActivityNet, HMDB51, and UCF101), demonstrating its effectiveness.
引用
收藏
页码:971 / 975
页数:5
相关论文
共 50 条
  • [21] Hierarchical Prompt Learning for Compositional Zero-Shot Recognition
    Wang, Henan
    Yang, Muli
    Wei, Kun
    Deng, Cheng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1470 - 1478
  • [22] Multi-TuneV: Fine-tuning the fusion of multiple modules for video action recognition
    Liu, Xinyuan
    Ye, Junyong
    Wang, Jingjing
    Xu, Guangyi
    Li, Youwei
    Zheng, Chaoming
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [23] Infusing Hierarchical Guidance into Prompt Tuning: A Parameter-Efficient Framework for Multi-level Implicit Discourse Relation Recognition
    Zhao, Haodong
    He, Ruifang
    Xiao, Mengnan
    Xu, Jing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6477 - 6492
  • [24] Human Action Recognition by Learning Bases of Action Attributes and Parts
    Yao, Bangpeng
    Jiang, Xiaoye
    Khosla, Aditya
    Lin, Andy Lai
    Guibas, Leonidas
    Li Fei-Fei
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1331 - 1338
  • [25] Robust relative attributes for human action recognition
    Zhang, Zhong
    Wang, Chunheng
    Xiao, Baihua
    Zhou, Wen
    Liu, Shuang
    PATTERN ANALYSIS AND APPLICATIONS, 2015, 18 (01) : 157 - 171
  • [26] Robust relative attributes for human action recognition
    Zhong Zhang
    Chunheng Wang
    Baihua Xiao
    Wen Zhou
    Shuang Liu
    Pattern Analysis and Applications, 2015, 18 : 157 - 171
  • [27] Action Recognition by Hierarchical Sequence Summarization
    Song, Yale
    Morency, Louis-Philippe
    Davis, Randall
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3562 - 3569
  • [28] Hierarchical Motion Evolution for Action Recognition
    Wang, Hongsong
    Wang, Wei
    Wang, Liang
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 574 - 578
  • [29] Texts as Images in Prompt Tuning for Multi-Label Image Recognition
    Guo, Zixian
    Dong, Bowen
    Ji, Zhilong
    Bai, Jinfeng
    Guo, Yiwen
    Zuo, Wangmeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2808 - 2817
  • [30] DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition
    Chan, Chunkit
    Liu, Xin
    Cheng, Jiayang
    Li, Zihan
    Song, Yangqiu
    Wong, Ginny Y.
    See, Simon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 35 - 57