Boundary Descriptors for Visual Speech Recognition

被引:0
|
作者
Gupta, Deepika [1 ]
Singh, Preety [1 ]
Laxmi, V. [1 ]
Gaur, Manoj S. [1 ]
机构
[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India
来源
COMPUTER AND INFORMATION SCIENCES II | 2012年
关键词
D O I
10.1007/978-1-4471-2155-8_39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.
引用
收藏
页码:307 / 313
页数:7
相关论文
共 50 条
  • [41] Another Point of View on Visual Speech Recognition
    Pouthier, Baptiste
    Pilati, Laurent
    Valenti, Giacomo
    Bouveyron, Charles
    Precioso, Frederic
    INTERSPEECH 2023, 2023, : 4089 - 4093
  • [42] DESIGNING RELEVANT FEATURES FOR VISUAL SPEECH RECOGNITION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2420 - 2424
  • [43] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
  • [44] Visual Speech Recognition in a Driver Assistance System
    Ivanko, Denis
    Ryumin, Dmitry
    Kashevnik, Alexey
    Axyonov, Alexandr
    Karpov, Alexey
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1131 - 1135
  • [45] DEEP WORD EMBEDDINGS FOR VISUAL SPEECH RECOGNITION
    Stafylakis, Themos
    Tzimiropoulos, Georgios
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4974 - 4978
  • [46] Effects of distance on visual and audiovisual speech recognition
    Jordan, TR
    Sergeant, P
    LANGUAGE AND SPEECH, 2000, 43 : 107 - 124
  • [47] Learning fuzzy rules for visual speech recognition
    Anwar, MA
    Baldwin, JF
    Martin, TP
    ADAPTIVE MULTIMEDIA RETRIEVAL, 2004, 3094 : 164 - 175
  • [48] CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [49] A new manifold representation for visual speech recognition
    Yu, Dahai
    Ghita, Ovidiu
    Sutherland, Alistair
    Whelan, Paul F.
    IMVIP 2007: INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, PROCEEDINGS, 2007, : 210 - 210
  • [50] The integration of auditory and visual attention in speech recognition
    Fussell, C
    Culling, JF
    BRITISH JOURNAL OF AUDIOLOGY, 2000, 34 (02): : 114 - 114