Boundary Descriptors for Visual Speech Recognition

被引：0

作者：

Gupta, Deepika ^{[1
]}

Singh, Preety ^{[1
]}

Laxmi, V. ^{[1
]}

Gaur, Manoj S. ^{[1
]}

机构：

[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India

来源：

COMPUTER AND INFORMATION SCIENCES II | 2012年

关键词：

D O I：

10.1007/978-1-4471-2155-8_39

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.

引用

页码：307 / 313

页数：7

共 50 条

[41] Another Point of View on Visual Speech Recognition
Pouthier, Baptiste
Pilati, Laurent
Valenti, Giacomo
Bouveyron, Charles
Precioso, Frederic
INTERSPEECH 2023, 2023, : 4089 - 4093
[42] DESIGNING RELEVANT FEATURES FOR VISUAL SPEECH RECOGNITION
Benhaim, Eric
Sahbi, Hichem
Vitte, Guillaume
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2420 - 2424
[43] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
[44] Visual Speech Recognition in a Driver Assistance System
Ivanko, Denis
Ryumin, Dmitry
Kashevnik, Alexey
Axyonov, Alexandr
Karpov, Alexey
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1131 - 1135
[45] DEEP WORD EMBEDDINGS FOR VISUAL SPEECH RECOGNITION
Stafylakis, Themos
Tzimiropoulos, Georgios
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4974 - 4978
[46] Effects of distance on visual and audiovisual speech recognition
Jordan, TR
Sergeant, P
LANGUAGE AND SPEECH, 2000, 43 : 107 - 124
[47] Learning fuzzy rules for visual speech recognition
Anwar, MA
Baldwin, JF
Martin, TP
ADAPTIVE MULTIMEDIA RETRIEVAL, 2004, 3094 : 164 - 175
[48] CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION
Benhaim, Eric
Sahbi, Hichem
Vitte, Guillaume
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[49] A new manifold representation for visual speech recognition
Yu, Dahai
Ghita, Ovidiu
Sutherland, Alistair
Whelan, Paul F.
IMVIP 2007: INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, PROCEEDINGS, 2007, : 210 - 210
[50] The integration of auditory and visual attention in speech recognition
Fussell, C
Culling, JF
BRITISH JOURNAL OF AUDIOLOGY, 2000, 34 (02): : 114 - 114

← 1 2 3 4 5 →