Boundary Descriptors for Visual Speech Recognition

被引:0
|
作者
Gupta, Deepika [1 ]
Singh, Preety [1 ]
Laxmi, V. [1 ]
Gaur, Manoj S. [1 ]
机构
[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India
关键词
D O I
10.1007/978-1-4471-2155-8_39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.
引用
收藏
页码:307 / 313
页数:7
相关论文
共 50 条
  • [31] Integrating syllable boundary information into speech recognition
    Wu, SL
    Shire, ML
    Greenberg, S
    Morgan, N
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 987 - 990
  • [32] Robust Face Frontalization For Visual Speech Recognition
    Kang, Zhiqi
    Horaud, Radu
    Sadeghi, Mostafa
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2485 - 2495
  • [33] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [34] A Survey on Different Visual Speech Recognition Techniques
    Bhaskar, Shabina
    Thasleema, T. M.
    Rajesh, R.
    DATA ANALYTICS AND LEARNING, 2019, 43 : 307 - 316
  • [35] Large-Scale Visual Speech Recognition
    Shillingford, Brendan
    Assael, Yannis
    Hoffman, Matthew W.
    Paine, Thomas
    Hughes, Cian
    Prabhu, Utsav
    Liao, Hank
    Sak, Hasim
    Rao, Kanishka
    Bennett, Lorrayne
    Mulville, Marie
    Denil, Misha
    Coppin, Ben
    Laurie, Ben
    Senior, Andrew
    de Freitas, Nando
    INTERSPEECH 2019, 2019, : 4135 - 4139
  • [36] Using the visual component in automatic speech recognition
    Brooke, NM
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1656 - 1659
  • [37] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
  • [38] Visual speech recognition for multiple languages in the wild
    Ma, Pingchuan
    Petridis, Stavros
    Pantic, Maja
    NATURE MACHINE INTELLIGENCE, 2022, 4 (11) : 930 - 939
  • [39] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [40] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184