Boundary Descriptors for Visual Speech Recognition

被引:0
|
作者
Gupta, Deepika [1 ]
Singh, Preety [1 ]
Laxmi, V. [1 ]
Gaur, Manoj S. [1 ]
机构
[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India
关键词
D O I
10.1007/978-1-4471-2155-8_39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.
引用
收藏
页码:307 / 313
页数:7
相关论文
共 50 条
  • [1] Supervised Kernel Descriptors for Visual Recognition
    Wang, Peng
    Wang, Jingdong
    Zeng, Gang
    Xu, Weiwei
    Zha, Hongbin
    Li, Shipeng
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2858 - 2865
  • [2] Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition
    Gurbuz, S
    Tufekci, Z
    Patterson, E
    Gowdy, JN
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 177 - 180
  • [3] Efficiency of chosen speech descriptors in relation to emotion recognition
    Kaminska, Dorota
    Sapinski, Tomasz
    Anbarjafari, Gholamreza
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [4] Efficiency of chosen speech descriptors in relation to emotion recognition
    Dorota Kamińska
    Tomasz Sapiński
    Gholamreza Anbarjafari
    EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [5] A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
    Yu, Dahai
    Ghita, Ovidiu
    Sutherland, Alistair
    Whelan, Paul F.
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2009, 5414 : 398 - 409
  • [6] Dense Trajectories and Motion Boundary Descriptors for Action Recognition
    Heng Wang
    Alexander Kläser
    Cordelia Schmid
    Cheng-Lin Liu
    International Journal of Computer Vision, 2013, 103 : 60 - 79
  • [7] Dense Trajectories and Motion Boundary Descriptors for Action Recognition
    Wang, Heng
    Klaeser, Alexander
    Schmid, Cordelia
    Liu, Cheng-Lin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 103 (01) : 60 - 79
  • [8] Visual Place Recognition Using Landmark Distribution Descriptors
    Panphattarasap, Pilailuck
    Calway, Andrew
    COMPUTER VISION - ACCV 2016, PT IV, 2017, 10114 : 487 - 502
  • [9] Place Recognition using Kernel Visual Keyword Descriptors
    Ali, Abbas M.
    Rashid, Tarik A.
    2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2015, : 921 - 926
  • [10] Vector Semantic Representations as Descriptors for Visual Place Recognition
    Neubert, Peer
    Schubert, Stefan
    Schlegel, Kenny
    Protzel, Peter
    ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,