Cross-modality online distillation for multi-view action recognition

被引:12
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [1] Privacy-Safe Action Recognition via Cross-Modality Distillation
    Kim, Yuhyun
    Jung, Jinwook
    Noh, Hyeoncheol
    Ahn, Byungtae
    Kwon, Junghye
    Choi, Dong-Geol
    IEEE ACCESS, 2024, 12 : 125955 - 125965
  • [2] Multi-view Cross-Modality MR Image Translation for Vestibular Schwannoma and Cochlea Segmentation
    Kang, Bogyeong
    Nam, Hyeonyeong
    Han, Ji-Wung
    Heo, Keun-Soo
    Kam, Tae-Eui
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2022, PT II, 2023, 14092 : 100 - 108
  • [3] The Effect of Audiovisual Cross-Modality on QoE of Multi-View Video and Audio IP Transmission
    Nunome, Toshiro
    Sako, Kazunori
    2016 18TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2016,
  • [4] Multi-view representation learning for multi-view action recognition
    Hao, Tong
    Wu, Dan
    Wang, Qian
    Sun, Jin-Sheng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 453 - 460
  • [5] MMA: a multi-view and multi-modality benchmark dataset for human action recognition
    Gao, Zan
    Han, Tao-tao
    Zhang, Hua
    Xue, Yan-bing
    Xu, Guang-ping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29383 - 29404
  • [6] MMA: a multi-view and multi-modality benchmark dataset for human action recognition
    Zan Gao
    Tao-tao Han
    Hua Zhang
    Yan-bing Xue
    Guang-ping Xu
    Multimedia Tools and Applications, 2018, 77 : 29383 - 29404
  • [7] Multi-View Action Recognition by Cross-domain Learning
    Nie, Weizhi
    Liu, Anan
    Yu, Jing
    Su, Yuting
    Chaisorn, Lekha
    Wang, Yongkang
    Kankanhalli, Mohan S.
    2014 IEEE 16TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2014,
  • [8] Cross-modality neighbor constraints based unbalanced multi-view text–image re-identification
    Li, Yongxi
    Tang, Wenzhong
    Zhang, Ke
    Zhu, Xi
    Wang, Haoming
    Wang, Shuai
    Multimedia Systems, 2024, 30 (06)
  • [9] MULTI-VIEW CONTRASTIVE LEARNING FOR ONLINE KNOWLEDGE DISTILLATION
    Yang, Chuanguang
    An, Zhulin
    Xu, Yongjun
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3750 - 3754
  • [10] Towards Cross-Modality Medical Image Segmentation with Online Mutual Knowledge Distillation
    Li, Kang
    Yu, Lequan
    Wang, Shujun
    Heng, Pheng-Ann
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 775 - 783