EVA: Enabling Video Attributes With Hierarchical Prompt Tuning for Action Recognition

被引:0
|
作者
Ruan, Xiangning [1 ]
Yin, Qixiang [1 ]
Su, Fei [1 ]
Zhao, Zhicheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
关键词
Feature extraction; Transformers; Visualization; Tuning; Adaptation models; Streaming media; Semantics; Computational modeling; Accuracy; Dictionaries; Parameter efficient transfer learning; prompt-based learning; action recognition; transformer;
D O I
10.1109/LSP.2025.3533307
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The pretraining and fine-tuning paradigm has excelled in action recognition. However, full fine-tuning is computationally and storage costly, while parameter-efficient fine-tuning (PEFT) always sacrifices accuracy and stability. To address these challenges, we propose a novel method, Enabling Video Attributes with Hierarchical Prompt Tuning (EVA), to guide action recognition. Firstly, instead of focusing solely on temporal features, EVA sparsely extracts six types of video attributes across two modalities, capturing the relatively gradual attribute changes in actions. Secondly, a hierarchical prompt tuning architecture with multiscale attribute prompts is introduced to learn the differences in actions. Finally, by adjusting only a small number of additional parameters, EVA outperforms all PEFT and most full fine-tuning methods across four widely used datasets (Something-Something V2, ActivityNet, HMDB51, and UCF101), demonstrating its effectiveness.
引用
收藏
页码:971 / 975
页数:5
相关论文
共 50 条
  • [41] Recurring the Transformer for Video Action Recognition
    Yang, Jiewen
    Dong, Xingbo
    Liu, Liujun
    Zhang, Chao
    Shen, Jiajun
    Yu, Dahai
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14043 - 14053
  • [42] Breaking video into pieces for action recognition
    Zheng, Ying
    Yao, Hongxun
    Sun, Xiaoshuai
    Jiang, Xuesong
    Porikli, Fatih
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (21) : 22195 - 22212
  • [43] Tuning of a Hierarchical Fuzzy System for Video De-interlacing
    Brox, Piedad
    Baturone, Iluminada
    Sanchez-Solano, Santiago
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [44] Action Recognition by Hierarchical Mid-level Action Elements
    Lan, Tian
    Zhu, Yuke
    Zamir, Amir Roshan
    Savarese, Silvio
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4552 - 4560
  • [45] Visual Attributes Based Sparse Multitask Action Recognition
    Wang, Qicong
    Zhao, Jinhao
    Shen, Yehu
    Li, Maozhen
    Wu, Yuxiang
    Lei, Yunqi
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1767 - 1772
  • [46] EPT: Data Augmentation with Embedded Prompt Tuning for LowResource Named Entity Recognition
    YU Hongfei
    NI Kunyu
    XU Rongkang
    YU Wenjun
    HUANG Yu
    Wuhan University Journal of Natural Sciences, 2023, 28 (04) : 299 - 308
  • [47] A Biomedical Named Entity Recognition Framework with Multi-granularity Prompt Tuning
    Liu, Zhuoya
    Chi, Tang
    Zhang, Peiliang
    Wu, Xiaoting
    Che, Chao
    HEALTH INFORMATION PROCESSING, CHIP 2022, 2023, 1772 : 95 - 105
  • [48] Robust Action Recognition Based on a Hierarchical Model
    Jiang, Xinbo
    Zhong, Fan
    Peng, Qunsheng
    Qin, Xueying
    2013 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2013, : 191 - 198
  • [49] Hierarchical Dynamic Parsing and Encoding for Action Recognition
    Su, Bing
    Zhou, Jiahuan
    Ding, Xiaoqing
    Wang, Hao
    Wu, Ying
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 202 - 217
  • [50] Constructing Hierarchical Spatiotemporal Information for Action Recognition
    Yao, Guangle
    Zhong, Jiandan
    Lei, Tao
    Liu, Xianyuan
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 596 - 602