Sign language recognition from digital videos using feature pyramid network with detection transformer

被引:7
|
作者
Liu, Yu [1 ]
Nand, Parma [1 ]
Hossain, Md Akbar [1 ]
Nguyen, Minh [1 ]
Yan, Wei Qi [1 ]
机构
[1] Auckland Univ Technol, Auckland 1010, New Zealand
关键词
Sign language recognition; ResNet152; Detection transformer; Feature pyramid network;
D O I
10.1007/s11042-023-14646-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sign language recognition is one of the fundamental ways to assist deaf people to communicate with others. An accurate vision-based sign language recognition system using deep learning is a fundamental goal for many researchers. Deep convolutional neural networks have been extensively considered in the last few years, and a slew of architectures have been proposed. Recently, Vision Transformer and other Transformers have shown apparent advantages in object recognition compared to traditional computer vision models such as Faster R-CNN, YOLO, SSD, and other deep learning models. In this paper, we propose a Vision Transformer-based sign language recognition method called DETR (Detection Transformer), aiming to improve the current state-of-the-art sign language recognition accuracy. The DETR method proposed in this paper is able to recognize sign language from digital videos with a high accuracy using a new deep learning model ResNet152 + FPN (i.e., Feature Pyramid Network), which is based on Detection Transformer. Our experiments show that the method has excellent potential for improving sign language recognition accuracy. For instance, our newly proposed net ResNet152 + FPN is able to enhance the detection accuracy up to 1.70% on the test dataset of sign language compared to the standard Detection Transformer models. Besides, an overall accuracy 96.45% was attained by using the proposed method.
引用
收藏
页码:21673 / 21685
页数:13
相关论文
共 50 条
  • [1] Sign language recognition from digital videos using feature pyramid network with detection transformer
    Yu Liu
    Parma Nand
    Md Akbar Hossain
    Minh Nguyen
    Wei Qi Yan
    Multimedia Tools and Applications, 2023, 82 : 21673 - 21685
  • [2] Sign boundary and hand articulation feature recognition in Sign Language videos
    Koulierakis, Ioannis
    Siolas, Georgios
    Efthimiou, Eleni
    Fotinea, Stavroula-Evita
    Stafylopatis, Andreas-Georgios
    MACHINE TRANSLATION, 2021, 35 (03) : 323 - 343
  • [3] Sign boundary and hand articulation feature recognition in Sign Language videos
    Koulierakis, Ioannis
    Siolas, Georgios
    Efthimiou, Eleni
    Fotinea, Stavroula-Evita
    Stafylopatis, Andreas-Georgios
    Machine Translation, 2021, 35 (03): : 323 - 343
  • [4] American Sign Language Recognition Using a Multimodal Transformer Network
    Hafeez, Khalid Abdel
    Massoud, Mazen
    Menegotti, Thomas
    Tannous, Johnathon
    Wedge, Sarah
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 654 - 659
  • [5] Integrated Feature Pyramid Network With Feature Aggregation for Traffic Sign Detection
    Tang, Qing
    Cao, Ge
    Jo, Kang-Hyun
    IEEE ACCESS, 2021, 9 : 117784 - 117794
  • [6] Indian sign language recognition system using network deconvolution and spatial transformer network
    Ghorai, Anudyuti
    Nandi, Utpal
    Changdar, Chiranjit
    Si, Tapas
    Singh, Moirangthem Marjit
    Mondal, Jyotsna Kumar
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (28): : 20889 - 20907
  • [7] Indian sign language recognition system using network deconvolution and spatial transformer network
    Anudyuti Ghorai
    Utpal Nandi
    Chiranjit Changdar
    Tapas Si
    Moirangthem Marjit Singh
    Jyotsna Kumar Mondal
    Neural Computing and Applications, 2023, 35 : 20889 - 20907
  • [8] Word recognition from Indian Sign Language in videos using dual feature descriptor and GMT-MASKRCNN recognition technique
    Naman Bansal
    Abhilasha Jain
    Multimedia Tools and Applications, 2025, 84 (5) : 2565 - 2597
  • [9] TRANSFER LEARNING FOR VIDEOS: FROM ACTION RECOGNITION TO SIGN LANGUAGE RECOGNITION
    Sarhan, Noha
    Frintrop, Simone
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1811 - 1815
  • [10] Korean Sign Language Recognition Using Transformer-Based Deep Neural Network
    Shin, Jungpil
    Musa Miah, Abu Saleh
    Hasan, Md. Al Mehedi
    Hirooka, Koki
    Suzuki, Kota
    Lee, Hyoun-Sup
    Jang, Si-Woong
    APPLIED SCIENCES-BASEL, 2023, 13 (05):