MASA: Motion-Aware Masked Autoencoder With Semantic Alignment for Sign Language Recognition

被引:0
|
作者
Zhao, Weichao [1 ]
Hu, Hezhen [2 ]
Zhou, Wengang [1 ]
Mao, Yunyao [1 ]
Wang, Min [3 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230027, Peoples R China
[2] Univ Texas Austin, Visual Informat Grp, Austin, TX 78705 USA
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230030, Peoples R China
基金
中国国家自然科学基金;
关键词
Masked autoencoder; motion-aware; semantic alignment; sign language recognition;
D O I
10.1109/TCSVT.2024.3409728
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: i) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. ii) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs. To this end, we propose a MotionAware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR. Our framework contains two crucial components, i.e., a motion-aware masked autoencoder (MA) and a momentum semantic alignment module (SA). Specifically, in MA, we introduce an autoencoder architecture with a motion-aware masked strategy to reconstruct motion residuals of masked frames, thereby explicitly exploring dynamic motion cues among sign pose sequences. Moreover, in SA, we embed our framework with global semantic awareness by aligning the embeddings of different augmented samples from the input sequence in the shared latent space. In this way, our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation. Furthermore, we conduct extensive experiments to validate the effectiveness of our method, achieving new stateof-the-art performance on four public benchmarks. The source code are publicly available at https://github.com/sakura/MASA.
引用
收藏
页码:10793 / 10804
页数:12
相关论文
共 50 条
  • [31] Semantic-aware entity alignment for low resource language knowledge graph
    Tang, Junfei
    Song, Ran
    Huang, Yuxin
    Gao, Shengxiang
    Yu, Zhengtao
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [32] Semantic-aware entity alignment for low resource language knowledge graph
    Junfei Tang
    Ran Song
    Yuxin Huang
    Shengxiang Gao
    Zhengtao Yu
    Frontiers of Computer Science, 2024, 18
  • [33] Semantic Boundary Detection With Reinforcement Learning for Continuous Sign Language Recognition
    Wei, Chengcheng
    Zhao, Jian
    Zhou, Wengang
    Li, Houqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (03) : 1138 - 1149
  • [34] Dynamic Sign Language Recognition Based on CBAM with Autoencoder Time Series Neural Network
    Huang, Yanglai
    Huang, Jing
    Wu, Xiaoyue
    Jia, Yu
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [35] Fine-Tuning a Video Masked Autoencoder to Develop an Augmented Reality Application for Brazilian Sign Language Interpretation
    Fanucchi, Rodrigo Zempulski
    Galvao, Arlindo Rodrigues, Jr.
    Marques, Gabriel da Mata
    Rodrigues, Lucas Brandao
    Soares, Anderson da Silva
    Lima Soares, Telma Woerle
    PROCEEDINGS OF 26TH SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2024, 2024, : 275 - 278
  • [36] Continuous sign language recognition based on iterative alignment network and attention mechanism
    Xue, Cuihong
    Yu, Ming
    Yan, Gang
    Gao, Yang
    Liu, Yuehao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 17195 - 17212
  • [37] Continuous sign language recognition based on iterative alignment network and attention mechanism
    Cuihong Xue
    Ming Yu
    Gang Yan
    Yang Gao
    Yuehao Liu
    Multimedia Tools and Applications, 2023, 82 : 17195 - 17212
  • [38] Real Time Sign Language Recognition using the Leap Motion Controller
    Naglot, Deepali
    Kulkarni, Milind
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 837 - 841
  • [39] Feasibility Study on Deep Learning Scheme for Sign Language Motion Recognition
    Sakamoto, Kazuki
    Ota, Eiji
    Ozawa, Tatsunori
    Nishimura, Hiromitsu
    Tanaka, Hiroshi
    COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, 2019, 772 : 1106 - 1115
  • [40] Recognition of Continuous Sign Language Alphabet Using Leap Motion Controller
    Cohen, Miri Weiss
    Ben Zikri, Nir Nir
    Velkovich, Alexander
    2018 11TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2018, : 193 - 199