EVA: Enabling Video Attributes With Hierarchical Prompt Tuning for Action Recognition

被引:0
|
作者
Ruan, Xiangning [1 ]
Yin, Qixiang [1 ]
Su, Fei [1 ]
Zhao, Zhicheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
关键词
Feature extraction; Transformers; Visualization; Tuning; Adaptation models; Streaming media; Semantics; Computational modeling; Accuracy; Dictionaries; Parameter efficient transfer learning; prompt-based learning; action recognition; transformer;
D O I
10.1109/LSP.2025.3533307
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The pretraining and fine-tuning paradigm has excelled in action recognition. However, full fine-tuning is computationally and storage costly, while parameter-efficient fine-tuning (PEFT) always sacrifices accuracy and stability. To address these challenges, we propose a novel method, Enabling Video Attributes with Hierarchical Prompt Tuning (EVA), to guide action recognition. Firstly, instead of focusing solely on temporal features, EVA sparsely extracts six types of video attributes across two modalities, capturing the relatively gradual attribute changes in actions. Secondly, a hierarchical prompt tuning architecture with multiscale attribute prompts is introduced to learn the differences in actions. Finally, by adjusting only a small number of additional parameters, EVA outperforms all PEFT and most full fine-tuning methods across four widely used datasets (Something-Something V2, ActivityNet, HMDB51, and UCF101), demonstrating its effectiveness.
引用
收藏
页码:971 / 975
页数:5
相关论文
共 50 条
  • [31] Hierarchical Prompt Tuning for Few-Shot Multi-Task Learning
    Liu, Jingping
    Chen, Tao
    Liang, Zujie
    Jiang, Haiyun
    Xiao, Yanghua
    Wei, Feng
    Qian, Yuxi
    Hao, Zhenghong
    Han, Bing
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1556 - 1565
  • [32] Video Character Recognition Through Hierarchical Classification
    Shivakumara, Palaiahnakote
    Trung Quy Phan
    Lu, Shijian
    Tan, Chew Lim
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 131 - 135
  • [33] A hierarchical Transformer network for smoke video recognition
    Cheng, Guangtao
    Xian, Baoyi
    Liu, Yifan
    Chen, Xue
    Hu, Lianjun
    Song, Zhanjie
    DIGITAL SIGNAL PROCESSING, 2025, 158
  • [34] Hierarchical Context Modeling for Video Event Recognition
    Wang, Xiaoyang
    Ji, Qiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (09) : 1770 - 1782
  • [35] Refining Action Segmentation with Hierarchical Video Representations
    Ahn, Hyemin
    Lee, Dongheui
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16282 - 16290
  • [36] Coupling Video Segmentation and Action Recognition
    Ghodrati, Amir
    Pedersoli, Marco
    Tuytelaars, Tinne
    2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 618 - 625
  • [37] Breaking video into pieces for action recognition
    Ying Zheng
    Hongxun Yao
    Xiaoshuai Sun
    Xuesong Jiang
    Fatih Porikli
    Multimedia Tools and Applications, 2017, 76 : 22195 - 22212
  • [38] Action recognition in broadcast tennis video
    Zhu, Guangyu
    Xu, Changsheng
    Huang, Qingming
    Gao, Wen
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 251 - +
  • [39] Video Action Retrieval Using Action Recognition Model
    Iinuma, Yuko
    Satoh, Shin'ichi
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 603 - 606
  • [40] Modeling Video Evolution For Action Recognition
    Fernando, Basura
    Gavves, Efstratios
    Oramas, Jose M.
    Ghodrati, Amir
    Tuytelaars, Tinne
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5378 - 5387