Modality Distillation with Multiple Stream Networks for Action Recognition

被引:106
|
作者
Garcia, Nuno C. [1 ,2 ]
Morerio, Pietro [1 ]
Murino, Vittorio [1 ,3 ]
机构
[1] Ist Italiano Tecnol, Genoa, Italy
[2] Univ Genoa, Genoa, Italy
[3] Univ Verona, Verona, Italy
来源
关键词
Action recognition; Deep multimodal learning; Distillation; Privileged information;
D O I
10.1007/978-3-030-01237-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of sensory inputs, it is often the case that not all modalities are available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to learn robust representations leveraging multimodal data in the training stage, while considering limitations at test time, such as noisy or missing modalities. This paper presents a new approach for multimodal video action recognition, developed within the unified frameworks of distillation and privileged information, named generalized distillation. Particularly, we consider the case of learning representations from depth and RGB videos, while relying on RGB data only at test time. We propose a new approach to train an hallucination network that learns to distill depth features through multiplicative connections of spatiotemporal representations, leveraging soft labels and hard labels, as well as distance between feature maps. We report state-of-the-art results on video action classification on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the UWA3DII and Northwestern-UCLA.
引用
收藏
页码:106 / 121
页数:16
相关论文
共 50 条
  • [1] Focal Channel Knowledge Distillation for Multi-Modality Action Recognition
    Gan, Lipeng
    Cao, Runze
    Li, Ning
    Yang, Man
    Li, Xiaochao
    [J]. IEEE ACCESS, 2023, 11 : 78285 - 78298
  • [2] Distillation Multiple Choice Learning for Multimodal Action Recognition
    Garcia, Nuno Cruz
    Bargal, Sarah Adel
    Ablavsky, Vitaly
    Morerio, Pietro
    Murino, Vittorio
    Sclaroff, Stan
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2754 - 2763
  • [3] Cross-modality online distillation for multi-view action recognition
    Xu, Chao
    Wu, Xia
    Li, Yachun
    Jin, Yining
    Wang, Mengmeng
    Liu, Yong
    [J]. NEUROCOMPUTING, 2021, 456 : 384 - 393
  • [4] Privacy-Safe Action Recognition via Cross-Modality Distillation
    Kim, Yuhyun
    Jung, Jinwook
    Noh, Hyeoncheol
    Ahn, Byungtae
    Kwon, Junghye
    Choi, Dong-Geol
    [J]. IEEE ACCESS, 2024, 12 : 125955 - 125965
  • [5] Cross-stream Selective Networks for Action Recognition
    Pan, Bowen
    Sun, Jiankai
    Lin, Wuwei
    Wang, Limin
    Lin, Weiyao
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 454 - 460
  • [6] Evaluation of Triple-Stream Convolutional Networks for Action Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    [J]. 2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 513 - 518
  • [7] Hidden Two-Stream Convolutional Networks for Action Recognition
    Zhu, Yi
    Lan, Zhenzhong
    Newsam, Shawn
    Hauptmann, Alexander
    [J]. COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
  • [8] Two-Stream Convolutional Networks for Action Recognition in Videos
    Simonyan, Karen
    Zisserman, Andrew
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [9] Multi-Stream Interaction Networks for Human Action Recognition
    Wang, Haoran
    Yu, Baosheng
    Li, Jiaqi
    Zhang, Linlin
    Chen, Dongyue
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3050 - 3060
  • [10] Two-stream spatiotemporal networks for skeleton action recognition
    Wang, Lei
    Zhang, Jianwei
    Yang, Shanmin
    Gu, Song
    [J]. IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370