Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition

被引:3
|
作者
Wang, Guanhong [1 ,2 ]
Zhou, Yang [1 ]
He, Zhanhao [1 ]
Lu, Keyu [1 ]
Feng, Yang [3 ]
Liu, Zuozhu [1 ,2 ]
Wang, Gaoang [1 ,2 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Univ Illinois Urbana Champaign Inst, Haining 314400, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[3] Angelalign Inc, Angelalign Res Inst, Shanghai 200011, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Video representation learning; Knowledge distillation; Action recognition; Video retrieval;
D O I
10.1016/j.neucom.2023.127136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based action recognition is an important task in the computer vision community, aiming to extract rich spatial-temporal information to recognize human actions from videos. Many approaches adopt self-supervised learning in large-scale unlabeled datasets and exploit transfer learning in the downstream action recognition task. Though much progress has been made for action recognition with video representation learning, two main issues remain for most existing methods. Firstly, the pre-training with self-supervised pretext tasks usually learns neutral and not much informative representations for the downstream action recognition task. Secondly, the valuable learned knowledge from large-scaled pre-training datasets will be gradually forgotten in the fine-tuning stage. To address such issues, in this paper, we propose a novel video representation learning method with knowledge-guided pre-training and fine-tuning for action recognition, which incorporates external human parsing knowledge for generating informative representation in the pre-training, and preserves the pre-trained knowledge in the fine-tuning stage to avoid catastrophic forgetting via self-distillation. Our model, with contributions from the external human parsing knowledge, video-level contrastive learning, and knowledge preserving self-distillation, achieves state-of-the-art performance on two popular benchmarks, i.e., UCF101 and HMDB51, verifying the effectiveness of the proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
    Nakamoto, Mitsuhiko
    Zhai, Yuexiang
    Singh, Anikait
    Mark, Max Sobol
    Ma, Yi
    Finn, Chelsea
    Kumar, Aviral
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Yang, Zhenglu
    Zhou, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
  • [43] Statistical-Mechanical Analysis of Pre-training and Fine Tuning in Deep Learning
    Ohzeki, Masayuki
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2015, 84 (03)
  • [44] Knowledge-Guided Efficient Representation Learning for Biomedical Domain
    Jha, Kishlay
    Xun, Guangxu
    Du, Nan
    Zhang, Aidong
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3077 - 3085
  • [45] Multi-TuneV: Fine-tuning the fusion of multiple modules for video action recognition
    Liu, Xinyuan
    Ye, Junyong
    Wang, Jingjing
    Xu, Guangyi
    Li, Youwei
    Zheng, Chaoming
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [46] Knowledge-Guided Disentangled Representation Learning for Recommender Systems
    Mu, Shanlei
    Li, Yaliang
    Zhao, Wayne Xin
    Li, Siqing
    Wen, Ji-Rong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (01)
  • [47] Empower Post-hoc Graph Explanations with Information Bottleneck: A Pre-training and Fine-tuning Perspective
    Wang, Jihong
    Luo, Minnan
    Li, Jundong
    Lin, Yun
    Dong, Yushun
    Dong, Jin Song
    Zheng, Qinghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2349 - 2360
  • [48] Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
    Addlesee, Angus
    Sieinska, Weronika
    Gunson, Nancie
    Garcia, Daniel Hernandez
    Dondrup, Christian
    Lemon, Oliver
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 229 - 241
  • [49] Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
    Soni, Sarvesh
    Roberts, Kirk
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5532 - 5538
  • [50] From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
    Bonfigli, Agnese
    Bacco, Luca
    Merone, Mario
    Dell'Orletta, Felice
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157