Character-level arabic text generation from sign language video using encoder-decoder model

被引:2
|
作者
Boukdir, Abdelbasset [1 ]
Benaddy, Mohamed [1 ]
El Meslouhi, Othmane [2 ]
Kardouchi, Mustapha [3 ]
Akhloufi, Moulay [3 ]
机构
[1] Ibn Zohr Univ, FSA PFO, LabSI Lab, Ouarzazate, Morocco
[2] Cadi Ayyad Univ, Natl Sch Appl Sci Safi, SARS Grp, Safi, Morocco
[3] Univ Moncton, Dept Comp Sci, PRIME Grp, Moncton, NB, Canada
关键词
Arabic text; Pose estimation; Video caption; Deep learning; Gated Recurrent Unit; NEURAL-NETWORK;
D O I
10.1016/j.displa.2022.102340
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding-decoding system that generates character-level Arabic sentences from isolated RGB videos of Moroccan sign language. The video sequence was encoded by a spatiotemporal feature extraction using pose estimation models, while the label text of the video is transmitted to a sequence of representative vectors. Both the features and the label vector are joined and treated by a decoder layer to derive a final prediction. We trained the proposed system on an isolated Moroccan Sign Language dataset (MoSLD), composed of RGB videos from 125 MoSL signs. The experimental results reveal that the proposed model attains the best performance under several evaluation metrics.
引用
收藏
页数:9
相关论文
共 33 条
  • [21] Code generation from a graphical user interface via attention-based encoder-decoder model
    Chen, Wen Yin
    Podstreleny, Pavol
    Cheng, Wen-Huang
    Chen, Yung-Yao
    Hua, Kai-Lung
    MULTIMEDIA SYSTEMS, 2022, 28 (01) : 121 - 130
  • [22] LGI-rPPG-Net: A shallow encoder-decoder model for rPPG signal estimation from facial video streams
    Chowdhury, Moajjem Hossain
    Chowdhury, Muhammad E. H.
    Reaz, Mamun Bin Ibne
    Ali, Sawal Hamid Md
    Rakhtala, Seyed Mehdi
    Murugappan, M.
    Mahmud, Sakib
    Shuzan, Nazmul Islam
    Bakar, Ahmad Ashrif A.
    Abd Razak, Mohd Ibrahim Bin Shapiai
    Khan, Muhammad Salman
    Khandakar, Amith
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [23] Improving code extraction from coding screencasts using a code-aware encoder-decoder model
    Malkadi, Abdulkarim
    Tayeb, Ahmad
    Haiduc, Sonia
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1492 - 1504
  • [24] Text generation from Taiwanese Sign Language using a PST-based language model for augmentative communication
    Wu, CH
    Chiu, YH
    Guo, CS
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2004, 12 (04) : 441 - 454
  • [25] Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
    Xiong, Ying
    Chen, Shuai
    Chen, Qingcai
    Yan, Jun
    Tang, Buzhou
    JMIR MEDICAL INFORMATICS, 2020, 8 (12)
  • [26] Enhanced model for abstractive Arabic text summarization using natural language generation and named entity recognition
    Nada Essa
    M. M. El-Gayar
    Eman M. El-Daydamony
    Neural Computing and Applications, 2025, 37 (10) : 7279 - 7301
  • [27] Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation
    Chen, Yen-Pin
    Chen, Yi-Ying
    Lin, Jr-Jiun
    Huang, Chien-Hua
    Lai, Feipei
    JMIR MEDICAL INFORMATICS, 2020, 8 (04)
  • [28] There and Back Again: 3D Sign Language Generation from Text Using Back-Translation
    Stoll, Stephanie
    Mustafa, Armin
    Guillemaut, Jean-Yves
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 187 - 196
  • [29] Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks
    Natarajan, B.
    Elakkiya, R.
    SOFT COMPUTING, 2022, 26 (23) : 13153 - 13175
  • [30] Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks
    B. Natarajan
    R. Elakkiya
    Soft Computing, 2022, 26 : 13153 - 13175