MASA: Motion-Aware Masked Autoencoder With Semantic Alignment for Sign Language Recognition

被引:0
|
作者
Zhao, Weichao [1 ]
Hu, Hezhen [2 ]
Zhou, Wengang [1 ]
Mao, Yunyao [1 ]
Wang, Min [3 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230027, Peoples R China
[2] Univ Texas Austin, Visual Informat Grp, Austin, TX 78705 USA
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230030, Peoples R China
基金
中国国家自然科学基金;
关键词
Masked autoencoder; motion-aware; semantic alignment; sign language recognition;
D O I
10.1109/TCSVT.2024.3409728
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: i) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. ii) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs. To this end, we propose a MotionAware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR. Our framework contains two crucial components, i.e., a motion-aware masked autoencoder (MA) and a momentum semantic alignment module (SA). Specifically, in MA, we introduce an autoencoder architecture with a motion-aware masked strategy to reconstruct motion residuals of masked frames, thereby explicitly exploring dynamic motion cues among sign pose sequences. Moreover, in SA, we embed our framework with global semantic awareness by aligning the embeddings of different augmented samples from the input sequence in the shared latent space. In this way, our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation. Furthermore, we conduct extensive experiments to validate the effectiveness of our method, achieving new stateof-the-art performance on four public benchmarks. The source code are publicly available at https://github.com/sakura/MASA.
引用
收藏
页码:10793 / 10804
页数:12
相关论文
共 50 条
  • [21] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 3408 - 3418
  • [22] Hand pose aware multimodal isolated sign language recognition
    Razieh Rastgoo
    Kourosh Kiani
    Sergio Escalera
    Multimedia Tools and Applications, 2021, 80 : 127 - 163
  • [23] Dynamical semantic enhancement network for continuous sign language recognition
    Wang, Suyang
    Guo, Leming
    Xue, Wanli
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [24] Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder
    Lakhal, Mohamed Ilyes
    Bowden, Richard
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [25] A framework for motion recognition with applications to American sign language and gait recognition
    Vogler, C
    Sun, H
    Metaxas, D
    WORKSHOP ON HUMAN MOTION, PROCEEDINGS, 2000, : 33 - 38
  • [26] American Sign Language Recognition Using Leap Motion Sensor
    Chuan, Ching-Hua
    Regina, Eric
    Guardino, Caroline
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 541 - 544
  • [27] A Chinese Sign Language Recognition System Using Leap Motion
    Xue, Yaofeng
    Gao, Shang
    Sun, Huali
    Qin, Wei
    2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 180 - 185
  • [28] Indonesian Sign Language Recognition Using Leap Motion Controller
    Wibowo, Midarto Dwi
    Nurtanio, Ingrid
    Ilham, Amil Ahmad
    PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEMS (ICTS), 2017, : 67 - 71
  • [29] Arabic Sign Language Recognition Using Leap Motion Sensor
    Elons, A. S.
    Ahmed, Menna
    Shedid, Hwaidaa
    Tolba, M. F.
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 368 - 373
  • [30] Arabic Sign Language Recognition using the Leap Motion Controller
    Mohandes, M.
    Aliyu, S.
    Deriche, M.
    2014 IEEE 23RD INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2014, : 960 - 965