Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition

被引：3

作者：

Wang, Guanhong ^{[1
,2
]}

Zhou, Yang ^{[1
]}

He, Zhanhao ^{[1
]}

Lu, Keyu ^{[1
]}

Feng, Yang ^{[3
]}

Liu, Zuozhu ^{[1
,2
]}

Wang, Gaoang ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Zhejiang Univ Univ Illinois Urbana Champaign Inst, Haining 314400, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

[3] Angelalign Inc, Angelalign Res Inst, Shanghai 200011, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 571卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Video representation learning; Knowledge distillation; Action recognition; Video retrieval;

D O I：

10.1016/j.neucom.2023.127136

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video-based action recognition is an important task in the computer vision community, aiming to extract rich spatial-temporal information to recognize human actions from videos. Many approaches adopt self-supervised learning in large-scale unlabeled datasets and exploit transfer learning in the downstream action recognition task. Though much progress has been made for action recognition with video representation learning, two main issues remain for most existing methods. Firstly, the pre-training with self-supervised pretext tasks usually learns neutral and not much informative representations for the downstream action recognition task. Secondly, the valuable learned knowledge from large-scaled pre-training datasets will be gradually forgotten in the fine-tuning stage. To address such issues, in this paper, we propose a novel video representation learning method with knowledge-guided pre-training and fine-tuning for action recognition, which incorporates external human parsing knowledge for generating informative representation in the pre-training, and preserves the pre-trained knowledge in the fine-tuning stage to avoid catastrophic forgetting via self-distillation. Our model, with contributions from the external human parsing knowledge, video-level contrastive learning, and knowledge preserving self-distillation, achieves state-of-the-art performance on two popular benchmarks, i.e., UCF101 and HMDB51, verifying the effectiveness of the proposed method.

引用

页数：10

共 50 条

[31] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Su, Peng
Vijay-Shanker, K.
BMC BIOINFORMATICS, 2022, 23 (01)
[32] Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding
Cao, Jin
Wang, Jun
Hamza, Wael
Vanee, Kelly
Li, Shang-Wen
INTERSPEECH 2020, 2020, : 1570 - 1574
[33] Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning
Zhang, Jian-Guo
Bui, Trung
Yoon, Seunghyun
Chen, Xiang
Liu, Zhiwei
Xia, Congying
Tran, Quan Hung
Chang, Walter
Yu, Philip
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1906 - 1912
[34] Robust Face Tracking Using Siamese-VGG with Pre-training and Fine-tuning
Yuan, Shuo
Yu, Xinguo
Majid, Abdul
2019 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS ENGINEERING (ICCRE), 2019, : 170 - 174
[35] Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning
Yang, Jingyun
Mark, Max Sobol
Vu, Brandon
Sharma, Archit
Bohg, Jeannette
Finn, Chelsea
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4804 - 4811
[36] Foundations and Applications in Large-scale AI Models: Pre-training, Fine-tuning, and Prompt-based Learning
Cheng, Derek
Patel, Dhaval
Pang, Linsey
Mehta, Sameep
Xie, Kexin
Chi, Ed H.
Liu, Wei
Chawla, Nitesh
Bailey, James
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5853 - 5854
[37] Improving Knowledge Graph Representation Learning by Structure Contextual Pre-training
Ye, Ganqiang
Zhang, Wen
Bi, Zhen
Wong, Chi Man
Chen, Hui
Chen, Huajun
PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 151 - 155
[38] CODE: Contrastive Pre-training with Adversarial Fine-Tuning for Zero-Shot Expert Linking
Chen, Bo
Zhang, Jing
Zhang, Xiaokang
Tang, Xiaobin
Cai, Lingfan
Chen, Hong
Li, Cuiping
Zhang, Peng
Tang, Jie
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11846 - 11854
[39] Trajectory-BERT: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement
Li, Lingyu
Huang, Tianyu
Li, Yihao
Li, Peng
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
[40] Editorial for Special Issue on Large-scale Pre-training: Data, Models, and Fine-tuning
Wen, Ji-Rong
Huang, Zi
Zhang, Hanwang
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 145 - 146

← 1 2 3 4 5 →