Visual speech feature extraction for improved speech recognition

被引：0

作者：

Zhang, X ^{[1
]}

Mersereau, RM ^{[1
]}

Clements, M ^{[1
]}

Broun, CC ^{[1
]}

机构：

[1] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA

来源：

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.

引用

页码：1993 / 1996

页数：4

共 50 条

[31] A Review of Feature Extraction and Classification Techniques in Speech Recognition
Yadav S.
Kumar A.
Yaduvanshi A.
Meena P.
SN Computer Science, 4 (6)
[32] MVDR based feature extraction for robust speech recognition
Dharanipragada, S
Rao, BD
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 309 - 312
[33] The application of the additive model in the feature extraction of speech recognition
Xi, WB
Fang, L
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 753 - 756
[34] Design of Feature Extraction Circuit for Speech Recognition Applications
Saambhavi, V. B.
Rao, S. S. S. P.
Rajalakshmi, P.
TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
[35] Tandem connectionist feature extraction for conversational speech recognition
Zhu, QF
Chen, B
Morgan, N
Stolcke, A
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 223 - 231
[36] Applying feature extraction of speech recognition on VOIP auditing
Wang, Xuan
Lin, Jiancheng
Sun, Yong
2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2007, : 237 - +
[37] Modified feature extraction methods in robust speech recognition
Rajnoha, Josef
Pollak, Petr
2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 337 - +
[38] Feature Extraction Analysis on Indonesian Speech Recognition System
Wisesty, Untari N.
Adiwijaya
Astuti, Widi
2015 3rd International Conference on Information and Communication Technology (ICoICT), 2015, : 54 - 58
[39] Applying sparse KPCA for feature extraction in speech recognition
Lima, A
Zen, H
Nankaku, Y
Tokuda, K
Kitamura, T
Resende, FG
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 401 - 409
[40] Discriminative temporal feature extraction for robust speech recognition
Shen, JL
ELECTRONICS LETTERS, 1997, 33 (19) : 1598 - 1600

← 1 2 3 4 5 →