Cross-modality online distillation for multi-view action recognition

被引:12
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [41] Multi-View Latent Variable Discriminative Models For Action Recognition
    Song, Yale
    Morency, Louis-Philippe
    Davis, Randall
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 2120 - 2127
  • [42] Unsupervised video segmentation for multi-view daily action recognition
    Liu, Zhigang
    Wu, Yin
    Yin, Ziyang
    Gao, Chunlei
    IMAGE AND VISION COMPUTING, 2023, 134
  • [43] Discriminative Multi-View Subspace Feature Learning for Action Recognition
    Sheng, Biyun
    Li, Jun
    Xiao, Fu
    Li, Qun
    Yang, Wankou
    Han, Junwei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4591 - 4600
  • [44] Cross-Modality Gesture Recognition With Complete Representation Projection
    Liu, Xiaokai
    Li, Mingyue
    Zhang, Boyi
    Hao, Luyuan
    Ma, Xiaorui
    Wang, Jie
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09): : 16184 - 16195
  • [45] Incremental Cross-Modality Deep Learning for Pedestrian Recognition
    Pop, Danut Ovidiu
    Rogozan, Alexandrina
    Nashashibi, Fawzi
    Bensrhair, Abdelaziz
    2017 28TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV 2017), 2017, : 523 - 528
  • [46] THE PICTURE SUPERIORITY EFFECT IN A CROSS-MODALITY RECOGNITION TASK
    STENBERG, G
    RADEBORG, K
    HEDMAN, LR
    MEMORY & COGNITION, 1995, 23 (04) : 425 - 441
  • [47] PATTERN-RECOGNITION IN CROSS-MODALITY LETTER MATCHING
    TEMPANY, C
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1981, 34 (JAN): : 38 - 38
  • [48] Cross-Modality Multi-Task Deep Metric Learning for Sketch Face Recognition
    Feng, Yujian
    Wu, Fei
    Huang, Qinghua
    Jing, Xiao-Yuan
    Ji, Yimu
    Yu, Jian
    Chen, Feng
    Han, Lu
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2277 - 2281
  • [49] Active multi-view object recognition: A unifying view on online feature selection and view planning
    Potthast, Christian
    Breitenmoser, Andreas
    Sha, Fei
    Sukhatme, Gaurav S.
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 84 : 31 - 47
  • [50] Multi-View Gait Image Generation for Cross-View Gait Recognition
    Chen, Xin
    Luo, Xizhao
    Weng, Jian
    Luo, Weiqi
    Li, Huiting
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3041 - 3055