Cross-modality online distillation for multi-view action recognition

被引:12
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [31] Multi-View and Multi-Modal Action Recognition with Learned Fusion
    Ardianto, Sandy
    Hang, Hsueh-Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1601 - 1604
  • [32] Active Multi-view Object Recognition and Online Feature Selection
    Potthast, Christian
    Breitenmoser, Andreas
    Sha, Fei
    Sukhatme, Gaurav S.
    ROBOTICS RESEARCH, VOL 2, 2018, 3 : 471 - 488
  • [33] Semi-Supervised Cross-Modality Action Recognition by Latent Tensor Transfer Learning
    Jia, Chengcheng
    Ding, Zhengming
    Kong, Yu
    Fu, Yun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2801 - 2814
  • [34] Variational Distillation for Multi-View Learning
    Tian, Xudong
    Zhang, Zhizhong
    Wang, Cong
    Zhang, Wensheng
    Qu, Yanyun
    Ma, Lizhuang
    Wu, Zongze
    Xie, Yuan
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4551 - 4566
  • [35] Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition
    Wang, Yancheng
    Xiao, Yang
    Lu, Junyi
    Tan, Bo
    Cao, Zhiguo
    Zhang, Zhenjun
    Zhou, Joey Tianyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5332 - 5345
  • [36] Pairwise-Covariance Multi-view Discriminant Analysis for Robust Cross-View Human Action Recognition
    Tran, Hoang-Nhat
    Nguyen, Hong-Quan
    Doan, Huong-Giang
    Tran, Thanh-Hai
    Le, Thi-Lan
    Vu, Hai
    IEEE ACCESS, 2021, 9 : 76097 - 76111
  • [37] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Gao, Xiangyang
    Hao, Fusheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
  • [38] Neural representation and learning for multi-view human action recognition
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [39] Learning Multi-View Interactional Skeleton Graph for Action Recognition
    Wang, Minsi
    Ni, Bingbing
    Yang, Xiaokang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6940 - 6954
  • [40] Jointly Learning Multi-view Features for Human Action Recognition
    Wang, Ruoshi
    Liu, Zhigang
    Yin, Ziyang
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4858 - 4861