Cross-modal knowledge distillation for continuous sign language recognition

被引:0
|
作者
Gao, Liqing [1 ]
Shi, Peng [1 ]
Hu, Lianyu [1 ]
Feng, Jichao [1 ]
Zhu, Lei [2 ]
Wan, Liang [1 ]
Feng, Wei [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
Sign language recognition; Knowledge distillation; Cross-modal; Attention mechanism;
D O I
10.1016/j.neunet.2024.106587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language poses a serious challenge for sign language recognition, which may result in insufficient training of language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video a dialogue sentence as input and outputs the sign language recognition result. The other teacher model the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Cross-Modal Knowledge Distillation with Dropout-Based Confidence
    Cho, Won Ik
    Kim, Jeunghun
    Kim, Nam Soo
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 653 - 657
  • [22] Multispectral Scene Classification via Cross-Modal Knowledge Distillation
    Liu, Hao
    Qu, Ying
    Zhang, Liqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [23] CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION
    Wei, Zimian
    Pan, Hengyue
    Qiao, Linbo
    Niu, Xin
    Dong, Peijie
    Li, Dongsheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4733 - 4737
  • [24] Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval
    Sun, Lina
    Li, Yewen
    Dong, Yumin
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 499 - 507
  • [25] Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
    Wang, Hu
    Ma, Congbo
    Zhang, Jianpeng
    Zhang, Yuan
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 216 - 226
  • [26] Cross-Modal Knowledge Distillation in Deep Networks for SAR Image Classification
    Jahan, Chowdhury Sadman
    Savakis, Andreas
    Blasch, Erik
    GEOSPATIAL INFORMATICS XII, 2022, 12099
  • [27] Cross-Modal Knowledge Distillation for Depth Privileged Monocular Visual Odometry
    Li, Bin
    Wang, Shuling
    Ye, Haifeng
    Gong, Xiaojin
    Xiang, Zhiyu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 6171 - 6178
  • [28] Conditional Sentence Generation and Cross-Modal Reranking for Sign Language Translation
    Zhao, Jian
    Qi, Weizhen
    Zhou, Wengang
    Duan, Nan
    Zhou, Ming
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2662 - 2672
  • [29] Continuous cross-modal hashing
    Zheng, Hao
    Wang, Jinbao
    Zhen, Xiantong
    Song, Jingkuan
    Zheng, Feng
    Lu, Ke
    Qi, Guo-Jun
    PATTERN RECOGNITION, 2023, 142
  • [30] Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
    Chen, Lijiang
    Ren, Jie
    Mao, Xia
    Zhao, Qi
    APPLIED SCIENCES-BASEL, 2022, 12 (09):