Cross-modal knowledge distillation for continuous sign language recognition

被引:0
|
作者
Gao, Liqing [1 ]
Shi, Peng [1 ]
Hu, Lianyu [1 ]
Feng, Jichao [1 ]
Zhu, Lei [2 ]
Wan, Liang [1 ]
Feng, Wei [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
Sign language recognition; Knowledge distillation; Cross-modal; Attention mechanism;
D O I
10.1016/j.neunet.2024.106587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language poses a serious challenge for sign language recognition, which may result in insufficient training of language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video a dialogue sentence as input and outputs the sign language recognition result. The other teacher model the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] CKDH: CLIP-Based Knowledge Distillation Hashing for Cross-Modal Retrieval
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Fang, Xiaozhao
    Xie, Shengli
    Xu, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6530 - 6541
  • [42] Self-Mutual Distillation Learning for Continuous Sign Language Recognition
    Hao, Aiming
    Min, Yuecong
    Chen, Xilin
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11283 - 11292
  • [43] Cross-modal distillation for flood extent mapping
    Garg, Shubhika
    Feinstein, Ben
    Timnat, Shahar
    Batchu, Vishal
    Dror, Gideon
    Rosenthal, Adi Gerzi
    Gulshan, Varun
    [J]. ENVIRONMENTAL DATA SCIENCE, 2023, 2
  • [44] Cross-Modal Knowledge Adaptation for Language-Based Person Search
    Chen, Yucheng
    Huang, Rui
    Chang, Hong
    Tan, Chuanqi
    Xue, Tao
    Ma, Bingpeng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4057 - 4069
  • [45] CROSS-MODAL KNOWLEDGE DISTILLATION FOR FINE-GRAINED ONE-SHOT CLASSIFICATION
    Zhao, Jiabao
    Lin, Xin
    Yang, Yifan
    Yang, Jing
    He, Liang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4295 - 4299
  • [46] Social Image-Text Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation
    Liu, Huan
    Li, Ke
    Fan, Jianping
    Yan, Caixia
    Qin, Tao
    Zheng, Qinghua
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 3332 - 3344
  • [47] Knowledge Distillation on Cross-Modal Adversarial Reprogramming for Data-Limited Atribute Inference
    Li, Quan
    Chen, Lingwei
    Jing, Shixiong
    Wu, Dinghao
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 65 - 68
  • [48] HAPTIC AND CROSS-MODAL RECOGNITION IN CHILDREN
    BUSHNELL, EW
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1991, 29 (06) : 499 - 499
  • [49] Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
    Wu, Wenhao
    Wang, Xiaohan
    Luo, Haipeng
    Wang, Jingdong
    Yang, Yi
    Ouyang, Wanli
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6620 - 6630
  • [50] Cross-modal attention and letter recognition
    Wesner, Michael
    Miller, Lisa
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 343 - 343