Cross-modal knowledge distillation for continuous sign language recognition

被引:0
|
作者
Gao, Liqing [1 ]
Shi, Peng [1 ]
Hu, Lianyu [1 ]
Feng, Jichao [1 ]
Zhu, Lei [2 ]
Wan, Liang [1 ]
Feng, Wei [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
Sign language recognition; Knowledge distillation; Cross-modal; Attention mechanism;
D O I
10.1016/j.neunet.2024.106587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language poses a serious challenge for sign language recognition, which may result in insufficient training of language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video a dialogue sentence as input and outputs the sign language recognition result. The other teacher model the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Visual context learning based on cross-modal knowledge for continuous sign language recognition
    Liu, Kailin
    Hou, Yonghong
    Guo, Zihui
    Yin, Wenjie
    Ren, Yi
    [J]. VISUAL COMPUTER, 2024,
  • [2] CROSS-MODAL KNOWLEDGE DISTILLATION FOR ACTION RECOGNITION
    Thoker, Fida Mohammad
    Gall, Juergen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 6 - 10
  • [3] Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition
    Zhang, Su
    Tang, Chuangao
    Guan, Cuntai
    [J]. PATTERN RECOGNITION, 2022, 130
  • [4] Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition
    Zhang, Su
    Tang, Chuangao
    Guan, Cuntai
    [J]. PATTERN RECOGNITION, 2022, 130
  • [5] Continuous Sign Language Recognition Based on Cross-Resolution Knowledge Distillation
    Zhu, Qidan
    Li, Jing
    Yuan, Fei
    Gan, Quan
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [6] Progressive Cross-modal Knowledge Distillation for Human Action Recognition
    Ni, Jianyuan
    Ngu, Anne H. H.
    Yan, Yan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5903 - 5912
  • [7] DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition
    Wang, Sijie
    She, Rui
    Kang, Qiyu
    Jian, Xingchao
    Zhao, Kai
    Song, Yang
    Tay, Wee Peng
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10377 - 10385
  • [8] Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition
    Wang, Jianrong
    Tang, Ziyue
    Li, Xuewei
    Yu, Mei
    Fang, Qiang
    Liu, Li
    [J]. INTERSPEECH 2021, 2021, : 2986 - 2990
  • [9] CROSS-MODAL KNOWLEDGE DISTILLATION FOR VISION-TO-SENSOR ACTION RECOGNITION
    Ni, Jianyuan
    Sarbajna, Raunak
    Liu, Yang
    Ngu, Anne H. H.
    Yan, Yan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4448 - 4452
  • [10] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
    Bano, Saira
    Tonellotto, Nicola
    Cassara, Pietro
    Gotta, Alberto
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)