Visual speech feature extraction for improved speech recognition

被引:0
|
作者
Zhang, X [1 ]
Mersereau, RM [1 ]
Clements, M [1 ]
Broun, CC [1 ]
机构
[1] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
引用
收藏
页码:1993 / 1996
页数:4
相关论文
共 50 条
  • [31] A Review of Feature Extraction and Classification Techniques in Speech Recognition
    Yadav S.
    Kumar A.
    Yaduvanshi A.
    Meena P.
    SN Computer Science, 4 (6)
  • [32] MVDR based feature extraction for robust speech recognition
    Dharanipragada, S
    Rao, BD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 309 - 312
  • [33] The application of the additive model in the feature extraction of speech recognition
    Xi, WB
    Fang, L
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 753 - 756
  • [34] Design of Feature Extraction Circuit for Speech Recognition Applications
    Saambhavi, V. B.
    Rao, S. S. S. P.
    Rajalakshmi, P.
    TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
  • [35] Tandem connectionist feature extraction for conversational speech recognition
    Zhu, QF
    Chen, B
    Morgan, N
    Stolcke, A
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 223 - 231
  • [36] Applying feature extraction of speech recognition on VOIP auditing
    Wang, Xuan
    Lin, Jiancheng
    Sun, Yong
    2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2007, : 237 - +
  • [37] Modified feature extraction methods in robust speech recognition
    Rajnoha, Josef
    Pollak, Petr
    2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 337 - +
  • [38] Feature Extraction Analysis on Indonesian Speech Recognition System
    Wisesty, Untari N.
    Adiwijaya
    Astuti, Widi
    2015 3rd International Conference on Information and Communication Technology (ICoICT), 2015, : 54 - 58
  • [39] Applying sparse KPCA for feature extraction in speech recognition
    Lima, A
    Zen, H
    Nankaku, Y
    Tokuda, K
    Kitamura, T
    Resende, FG
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 401 - 409
  • [40] Discriminative temporal feature extraction for robust speech recognition
    Shen, JL
    ELECTRONICS LETTERS, 1997, 33 (19) : 1598 - 1600