Character-level arabic text generation from sign language video using encoder-decoder model

被引:2
|
作者
Boukdir, Abdelbasset [1 ]
Benaddy, Mohamed [1 ]
El Meslouhi, Othmane [2 ]
Kardouchi, Mustapha [3 ]
Akhloufi, Moulay [3 ]
机构
[1] Ibn Zohr Univ, FSA PFO, LabSI Lab, Ouarzazate, Morocco
[2] Cadi Ayyad Univ, Natl Sch Appl Sci Safi, SARS Grp, Safi, Morocco
[3] Univ Moncton, Dept Comp Sci, PRIME Grp, Moncton, NB, Canada
关键词
Arabic text; Pose estimation; Video caption; Deep learning; Gated Recurrent Unit; NEURAL-NETWORK;
D O I
10.1016/j.displa.2022.102340
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding-decoding system that generates character-level Arabic sentences from isolated RGB videos of Moroccan sign language. The video sequence was encoded by a spatiotemporal feature extraction using pose estimation models, while the label text of the video is transmitted to a sequence of representative vectors. Both the features and the label vector are joined and treated by a decoder layer to derive a final prediction. We trained the proposed system on an isolated Moroccan Sign Language dataset (MoSLD), composed of RGB videos from 125 MoSL signs. The experimental results reveal that the proposed model attains the best performance under several evaluation metrics.
引用
收藏
页数:9
相关论文
共 33 条
  • [31] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
    Abdelbasset Boukdir
    Mohamed Benaddy
    Ayoub Ellahyani
    Othmane El Meslouhi
    Mustapha Kardouchi
    Signal, Image and Video Processing, 2022, 16 : 2055 - 2062
  • [32] 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
    Boukdir, Abdelbasset
    Benaddy, Mohamed
    Ellahyani, Ayoub
    El Meslouhi, Othmane
    Kardouchi, Mustapha
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (08) : 2055 - 2062
  • [33] High-Level Synthesis Revised: Generation of FPGA Accelerators from a Domain-Specific Language using the Polyhedron Model
    Schmid, Moritz
    Hannig, Frank
    Tanase, Alexandru
    Teich, Juergen
    PARALLEL COMPUTING: ACCELERATING COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, 25 : 497 - 506