Distilling interaction knowledge for semi-supervised egocentric action recognition

被引:0
|
作者
Wang, Haoran [1 ]
Yang, Jiahao [1 ]
Yu, Baosheng [2 ]
Zhan, Yibing [3 ]
Tao, Dapeng [4 ]
Ling, Haibin [5 ]
机构
[1] College of Information Science and Engineering, Northeastern University, Shenyang,110819, China
[2] School of Computer Science, The University of Sydney, Darlington,NSW,2008, Australia
[3] JD Explore Academy, Beijing,100176, China
[4] School of Information Science and Engineering, Yunnan University, Yunnan, Kunming,650091, China
[5] Department of Computer Science, Stony Brook University, Stony Brook, United States
关键词
Contrastive Learning - Semi-supervised learning - Video analysis;
D O I
10.1016/j.patcog.2024.110927
中图分类号
学科分类号
摘要
Egocentric action recognition, the identification of actions within video content obtained from a first-person perspective, is receiving increasing attention due to the widespread adoption of wearable camera technology. Nonetheless, the task of annotating actions within a video characterized by a cluttered background and the presence of various objects is labor-intensive. In this paper, we consider learning for egocentric action recognition in a semi-supervised manner. Inspired by the fact that videos captured from first-person viewpoint usually contain rich contents about how human hands interact with objects, we thus propose to employ a popular teacher–student framework and distill the interaction knowledge between hand and objects for semi-supervised egocentric action recognition. We refer to the proposed method as Interaction Knowledge Distillation or IKD. Specifically, the teacher network takes hands and action-related objects in the labeled videos as input, and uses graph neural networks to capture their spatial–temporal relations as graph edge features. The student network then takes the detected hands/objects from both labeled and unlabeled videos as input and mimics the teacher network to learn from the interactions to improve model performance. Experiments are performed on two popular egocentric action recognition datasets, Something-Something-V2 and EPIC-KITCHENS-100, which show that our proposed approach consistently outperforms recent state-of-the-art methods in typical semi-supervised settings. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [1] Evaluation of semi-supervised learning method on action recognition
    Shen, Haoquan
    Yan, Yan
    Xu, Shicheng
    Ballas, Nicolas
    Chen, Wenzhi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (02) : 523 - 542
  • [2] Semi-supervised Hessian Eigenmap for Human Action Recognition
    Ma, Xueqi
    Pan, Jiaxing
    Wang, Yue
    Liu, Weifeng
    [J]. INTELLIGENT VISUAL SURVEILLANCE (IVS 2016), 2016, 664 : 133 - 139
  • [3] Semi-Supervised Multiple Feature Analysis for Action Recognition
    Wang, Sen
    Ma, Zhigang
    Yang, Yi
    Li, Xue
    Pang, Chaoyi
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (02) : 289 - 298
  • [4] Evaluation of semi-supervised learning method on action recognition
    Haoquan Shen
    Yan Yan
    Shicheng Xu
    Nicolas Ballas
    Wenzhi Chen
    [J]. Multimedia Tools and Applications, 2015, 74 : 523 - 542
  • [5] Semi-Supervised Action Recognition with Temporal Contrastive Learning
    Singh, Ankit
    Chakraborty, Omprakash
    Varshney, Ashutosh
    Panda, Rameswar
    Feris, Rogerio
    Saenko, Kate
    Das, Abir
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10384 - 10394
  • [6] SVFormer: Semi-supervised Video Transformer for Action Recognition
    Xing, Zhen
    Dai, Qi
    Hu, Han
    Chen, Jingjing
    Wu, Zuxuan
    Jiang, Yu-Gang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18816 - 18826
  • [7] Domain Adaptable Normalization for Semi-Supervised Action Recognition in the Dark
    Liang, Zixi
    Chen, Jiajun
    Chen, Rui
    Zheng, Bingbing
    Zhou, Mingyue
    Gao, Huaien
    Lin, Shan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4250 - 4257
  • [8] GRA: Graph Representation Alignment for Semi-Supervised Action Recognition
    Huang, Kuan-Hung
    Huang, Yao-Bang
    Lin, Yong-Xiang
    Hua, Kai-Lung
    Tanveer, M.
    Lu, Xuequan
    Razzak, Imran
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 11896 - 11905
  • [9] Learning from Temporal Gradient for Semi-supervised Action Recognition
    Xiao, Junfei
    Jing, Longlong
    Zhang, Lin
    He, Ju
    She, Qi
    Zhou, Zongwei
    Yuille, Alan
    Li, Yingwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3242 - 3252
  • [10] Action Recognition via Adaptive Semi-Supervised Feature Analysis
    Xu, Zengmin
    Li, Xiangli
    Li, Jiaofen
    Chen, Huafeng
    Hu, Ruimin
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (13):