Multi-modal Sign Language Recognition with Enhanced Spatiotemporal Representation

被引:0
|
作者
Xiao, Shiwei [1 ]
Fang, Yuchun [1 ]
Ni, Lan [2 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
[2] Shanghai Univ, Coll Liberal Arts, Shanghai, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Sign language recognition; Pseudo-3D blocks; soft attention model; multi-task learning;
D O I
10.1109/IJCNN52387.2021.9533707
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language recognition (SLR) has become increasingly popular in recent years in computer vision. It is essential to extract discriminative spatiotemporal features to model the spatial and temporal evolutions of different signs. Also, local gesture and facial expression representations contribute to distinguishing signs with similar motion patterns but different meanings. In this paper, we propose a multi-modal sign language recognition framework, in the RGB representation model, we design the adaptive spatiotemporal attention modules to fulfill the visual cue definition in signing videos, and the adapter is designed for constructing the auxiliary task, which jointly learning with the SLR task to enhances the performance of the model. Given a signing video, a spatiotemporal attention-based Pseudo-3D Residual Networks (STA P3D ResNet) is used to learn spatiotemporal features mainly from the areas of interest and the key frames. After feature extraction, the attention-based Bidirectional Long Short-Term Memory Networks (Att-BLSTM) is utilized to select the significant motions. Meanwhile, from the skeletal data, we can obtain the texture image by color encoding and construct the spatial relation features, which are high level representations of human posture. The learnt skeleton-based features from skeletal data fused with the attention-aware video features to further provide more informative spatiotemporal information for SLR. Experiments are carried out on two large scale sign language datasets. And the experimental results demonstrate the effectiveness of our proposed method.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    [J]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 3408 - 3418
  • [2] Skeleton Aware Multi-modal Sign Language Recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3408 - 3418
  • [3] Traffic Sign Recognition Based on Parameter-Free Detector and Multi-modal Representation
    Gu Mingqin
    Chen Xiaohua
    Zhang Shaoyong
    Ren Xiaoping
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016 COLLOCATED WORKSHOPS, 2016, 10049 : 115 - 124
  • [4] MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition
    Sadeghzadeh, Arezoo
    Shah, A. F. M. Shahen
    Islam, Md Baharul
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
  • [5] Multi-Modal Fusion Sign Language Recognition Based on Residual Network and Attention Mechanism
    Chu Chaoqin
    Xiao Qinkun
    Zhang Yinhuan
    Xing, Liu
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [6] On the use of Multi-Modal Sensing in Sign Language Classification
    Sharma, Sneha
    Gupta, Rinki
    Kumar, Arun
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2019, : 495 - 500
  • [7] Multi-modal Dialogue System with Sign Language Capabilities
    Hruz, M.
    Campr, P.
    Krnoul, Z.
    Zelezny, M.
    Aran, Oya
    Santemiz, Pinar
    [J]. ASSETS 11: PROCEEDINGS OF THE 13TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2011, : 265 - 266
  • [8] Enhanced Topic Modeling with Multi-modal Representation Learning
    Zhang, Duoyi
    Wang, Yue
    Abul Bashar, Md
    Nayak, Richi
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 393 - 404
  • [9] Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine
    Rastgoo, Razieh
    Kiani, Kourosh
    Escalera, Sergio
    [J]. ENTROPY, 2018, 20 (11)
  • [10] Multi-Modal Face Recognition
    Shen, Haihong
    Ma, Liqun
    Zhang, Qishan
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 5, 2010, : 612 - 616