Skeleton Aware Multi-modal Sign Language Recognition

被引:93
|
作者
Jiang, Songyao [1 ]
Sun, Bin [1 ]
Wang, Lichen [1 ]
Bai, Yue [1 ]
Li, Kunpeng [1 ]
Fu, Yun [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
关键词
D O I
10.1109/CVPRW53098.2021.00380
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR
引用
收藏
页码:3408 / 3418
页数:11
相关论文
共 50 条
  • [31] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [32] On Multi-modal Fusion for Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
  • [33] Directing Humanoids in a Multi-modal Command Language
    Oka, Tetsushi
    Abe, Toyokazu
    Shimoji, Masato
    Nakamura, Takuya
    Sugita, Kaoru
    Yokota, Masao
    [J]. 2008 17TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2008, : 580 - 585
  • [34] Intention aware interactive multi-modal robot programming
    Iba, S
    Paredis, CJJ
    Khosla, PK
    [J]. IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 3479 - 3484
  • [35] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [36] Modality Mixer for Multi-modal Action Recognition
    Lee, Sumin
    Woo, Sangmin
    Park, Yeonju
    Nugroho, Muhammad Adi
    Kim, Changick
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3297 - 3306
  • [37] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [38] Multi-modal broad learning for material recognition
    Wang, Zhaoxin
    Liu, Huaping
    Xu, Xinying
    Sun, Fuchun
    [J]. COGNITIVE COMPUTATION AND SYSTEMS, 2021, 3 (02) : 123 - 130
  • [39] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    [J]. 2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
  • [40] Traffic Sign Recognition via Multi-Modal Tree-Structure Embedded Multi-Task Learning
    Lu, Xiao
    Wang, Yaonan
    Zhou, Xuanyu
    Zhang, Zhenjun
    Ling, Zhigang
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (04) : 960 - 972