Boundary Descriptors for Visual Speech Recognition

被引：0

作者：

Gupta, Deepika ^{[1
]}

Singh, Preety ^{[1
]}

Laxmi, V. ^{[1
]}

Gaur, Manoj S. ^{[1
]}

机构：

[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India

来源：

COMPUTER AND INFORMATION SCIENCES II | 2012年

关键词：

D O I：

10.1007/978-1-4471-2155-8_39

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.

引用

页码：307 / 313

页数：7

共 50 条

[1] Supervised Kernel Descriptors for Visual Recognition
Wang, Peng
Wang, Jingdong
Zeng, Gang
Xu, Weiwei
Zha, Hongbin
Li, Shipeng
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2858 - 2865
[2] Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition
Gurbuz, S
Tufekci, Z
Patterson, E
Gowdy, JN
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 177 - 180
[3] Efficiency of chosen speech descriptors in relation to emotion recognition
Kaminska, Dorota
Sapinski, Tomasz
Anbarjafari, Gholamreza
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
[4] Efficiency of chosen speech descriptors in relation to emotion recognition
Dorota Kamińska
Tomasz Sapiński
Gholamreza Anbarjafari
EURASIP Journal on Audio, Speech, and Music Processing, 2017
[5] A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
Yu, Dahai
Ghita, Ovidiu
Sutherland, Alistair
Whelan, Paul F.
ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2009, 5414 : 398 - 409
[6] Dense Trajectories and Motion Boundary Descriptors for Action Recognition
Heng Wang
Alexander Kläser
Cordelia Schmid
Cheng-Lin Liu
International Journal of Computer Vision, 2013, 103 : 60 - 79
[7] Dense Trajectories and Motion Boundary Descriptors for Action Recognition
Wang, Heng
Klaeser, Alexander
Schmid, Cordelia
Liu, Cheng-Lin
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 103 (01) : 60 - 79
[8] Visual Place Recognition Using Landmark Distribution Descriptors
Panphattarasap, Pilailuck
Calway, Andrew
COMPUTER VISION - ACCV 2016, PT IV, 2017, 10114 : 487 - 502
[9] Place Recognition using Kernel Visual Keyword Descriptors
Ali, Abbas M.
Rashid, Tarik A.
2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2015, : 921 - 926
[10] Vector Semantic Representations as Descriptors for Visual Place Recognition
Neubert, Peer
Schubert, Stefan
Schlegel, Kenny
Protzel, Peter
ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,

← 1 2 3 4 5 →