Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

被引:0
|
作者
Yu, Bruce X. B. [1 ]
Liu, Yan [1 ]
Chan, Keith C. C. [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Indoor action recognition plays an important role in modern society, such as intelligent healthcare in large mobile cabin hospitals. With the wide usage of depth sensors like Kinect, multimodal information including skeleton and RGB modalities brings a promising way to improve the performance. However, existing methods are either focusing on a single data modality or failed to take the advantage of multiple data modalities. In this paper, we propose a Teacher-Student Multimodal Fusion (TSMF) model that fuses the skeleton and RGB modalities at the model level for indoor action recognition. In our TSMF, we utilize a teacher network to transfer the structural knowledge of the skeleton modality to a student network for the RGB modality. With extensive experiments on two benchmarking datasets: NTU RGB+D and PKU-MMD, results show that the proposed TSMF consistently performs better than state-of-the-art single modal and multimodal methods. It also indicates that our TSMF could not only improve the accuracy of the student network but also significantly improve the ensemble accuracy.
引用
收藏
页码:3199 / 3207
页数:9
相关论文
共 50 条
  • [31] Cross-datasets facial expression recognition via distance metric learning and teacher-student model
    Hao Meng
    Fei Yuan
    Yang Tian
    Tianhao Yan
    [J]. Multimedia Tools and Applications, 2022, 81 : 5621 - 5643
  • [32] A novel dataset based on indoor teacher-student interactive mode using AIoT
    Zhao, Jian
    Xu, Maolin
    Wang, Xuezhu
    [J]. INTERNET OF THINGS, 2024, 25
  • [33] Transformer fault diagnosis based on relational teacher-student network
    Yin Sihan
    Li Yalei
    Liu Xiaoping
    Cui Xu
    Wang Huapeng
    [J]. The Journal of China Universities of Posts and Telecommunications, 2023, 30 (03) : 41 - 54
  • [34] Dynamic Self-Supervised Teacher-Student Network Learning
    Ye, Fei
    Bors, Adrian G.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5731 - 5748
  • [35] Cross-datasets facial expression recognition via distance metric learning and teacher-student model
    Meng, Hao
    Yuan, Fei
    Tian, Yang
    Yan, Tianhao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 5621 - 5643
  • [36] Robust Speech Recognition Using Teacher-Student Learning Domain Adaptation
    Ma, Han
    Zhang, Qiaoling
    Tang, Roubing
    Zhang, Lu
    Jia, Yubo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (12) : 2112 - 2118
  • [37] A deep multimodal network based on bottleneck layer features fusion for action recognition
    Singh, Tej
    Vishwakarma, Dinesh Kumar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (24) : 33505 - 33525
  • [38] A deep multimodal network based on bottleneck layer features fusion for action recognition
    Tej Singh
    Dinesh Kumar Vishwakarma
    [J]. Multimedia Tools and Applications, 2021, 80 : 33505 - 33525
  • [39] The Recognition of Teacher Behavior Based on Multimodal Information Fusion
    Wu, Dongli
    Chen, Jia
    Deng, Wei
    Wei, Yantao
    Luo, Heng
    Wei, Yangyu
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [40] Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
    Wu, Long
    Chen, Hangting
    Wang, Li
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 431 - 435