Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition

被引:3
|
作者
Wang, Guanhong [1 ,2 ]
Zhou, Yang [1 ]
He, Zhanhao [1 ]
Lu, Keyu [1 ]
Feng, Yang [3 ]
Liu, Zuozhu [1 ,2 ]
Wang, Gaoang [1 ,2 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Univ Illinois Urbana Champaign Inst, Haining 314400, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[3] Angelalign Inc, Angelalign Res Inst, Shanghai 200011, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Video representation learning; Knowledge distillation; Action recognition; Video retrieval;
D O I
10.1016/j.neucom.2023.127136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based action recognition is an important task in the computer vision community, aiming to extract rich spatial-temporal information to recognize human actions from videos. Many approaches adopt self-supervised learning in large-scale unlabeled datasets and exploit transfer learning in the downstream action recognition task. Though much progress has been made for action recognition with video representation learning, two main issues remain for most existing methods. Firstly, the pre-training with self-supervised pretext tasks usually learns neutral and not much informative representations for the downstream action recognition task. Secondly, the valuable learned knowledge from large-scaled pre-training datasets will be gradually forgotten in the fine-tuning stage. To address such issues, in this paper, we propose a novel video representation learning method with knowledge-guided pre-training and fine-tuning for action recognition, which incorporates external human parsing knowledge for generating informative representation in the pre-training, and preserves the pre-trained knowledge in the fine-tuning stage to avoid catastrophic forgetting via self-distillation. Our model, with contributions from the external human parsing knowledge, video-level contrastive learning, and knowledge preserving self-distillation, achieves state-of-the-art performance on two popular benchmarks, i.e., UCF101 and HMDB51, verifying the effectiveness of the proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A knowledge-guided pre-training framework for improving molecular representation learning
    Li, Han
    Zhang, Ruotian
    Min, Yaosen
    Ma, Dacheng
    Zhao, Dan
    Zeng, Jianyang
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [2] A knowledge-guided pre-training framework for improving molecular representation learning
    Han Li
    Ruotian Zhang
    Yaosen Min
    Dacheng Ma
    Dan Zhao
    Jianyang Zeng
    Nature Communications, 14 (1)
  • [3] Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Mode
    Oh, Seo Hyun
    Kang, Min
    Lee, Youngho
    HEALTHCARE INFORMATICS RESEARCH, 2022, 28 (01) : 16 - 24
  • [4] KGWE: A Knowledge-guided Word Embedding Fine-tuning Model
    Kun, Kong Wei
    Racharak, Teeradaj
    Yiming, Cao
    Cheng, Peng
    Le Nguyen, Minh
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1221 - 1225
  • [5] KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
    Sun, Hao
    Tan, Xu
    Gan, Jun-Wei
    Zhao, Sheng
    Han, Dongxu
    Liu, Hongzhi
    Qin, Tao
    Liu, Tie-Yan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 168 - 175
  • [6] Pre-training Fine-tuning data Enhancement method based on active learning
    Cao, Deqi
    Ding, Zhaoyun
    Wang, Fei
    Ma, Haoyang
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1447 - 1454
  • [7] FOOD IMAGE RECOGNITION USING DEEP CONVOLUTIONAL NETWORK WITH PRE-TRAINING AND FINE-TUNING
    Yanai, Keiji
    Kawano, Yoshiyuki
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2015,
  • [8] SAR-HUB: Pre-Training, Fine-Tuning, and Explaining
    Yang, Haodong
    Kang, Xinyue
    Liu, Long
    Liu, Yujiang
    Huang, Zhongling
    REMOTE SENSING, 2023, 15 (23)
  • [9] AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
    Li, Ming
    Wu, Jie
    Wang, Xionghui
    Chen, Chen
    Qin, Jie
    Xiao, Xuefeng
    Wang, Rui
    Zheng, Min
    Pan, Xin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6843 - 6853
  • [10] Improved Fine-Tuning by Better Leveraging Pre-Training Data
    Liu, Ziquan
    Xu, Yi
    Xu, Yuanhong
    Qian, Qi
    Li, Hao
    Ji, Xiangyang
    Chan, Antoni B.
    Jin, Rong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,