Boundary Descriptors for Visual Speech Recognition

被引：0

作者：

Gupta, Deepika ^{[1
]}

Singh, Preety ^{[1
]}

Laxmi, V. ^{[1
]}

Gaur, Manoj S. ^{[1
]}

机构：

[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India

来源：

COMPUTER AND INFORMATION SCIENCES II | 2012年

关键词：

D O I：

10.1007/978-1-4471-2155-8_39

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.

引用

页码：307 / 313

页数：7

共 50 条

[31] Integrating syllable boundary information into speech recognition
Wu, SL
Shire, ML
Greenberg, S
Morgan, N
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 987 - 990
[32] Robust Face Frontalization For Visual Speech Recognition
Kang, Zhiqi
Horaud, Radu
Sadeghi, Mostafa
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2485 - 2495
[33] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[34] A Survey on Different Visual Speech Recognition Techniques
Bhaskar, Shabina
Thasleema, T. M.
Rajesh, R.
DATA ANALYTICS AND LEARNING, 2019, 43 : 307 - 316
[35] Large-Scale Visual Speech Recognition
Shillingford, Brendan
Assael, Yannis
Hoffman, Matthew W.
Paine, Thomas
Hughes, Cian
Prabhu, Utsav
Liao, Hank
Sak, Hasim
Rao, Kanishka
Bennett, Lorrayne
Mulville, Marie
Denil, Misha
Coppin, Ben
Laurie, Ben
Senior, Andrew
de Freitas, Nando
INTERSPEECH 2019, 2019, : 4135 - 4139
[36] Using the visual component in automatic speech recognition
Brooke, NM
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1656 - 1659
[37] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
[38] Visual speech recognition for multiple languages in the wild
Ma, Pingchuan
Petridis, Stavros
Pantic, Maja
NATURE MACHINE INTELLIGENCE, 2022, 4 (11) : 930 - 939
[39] Audio-visual speech recognition by speechreading
Zhang, XZ
Mersereau, RM
Clements, MA
DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
[40] Audio-visual integration for speech recognition
Kober, R
Harz, U
NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184

← 1 2 3 4 5 →