Visual speech feature extraction for improved speech recognition

被引:0
|
作者
Zhang, X [1 ]
Mersereau, RM [1 ]
Clements, M [1 ]
Broun, CC [1 ]
机构
[1] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
引用
收藏
页码:1993 / 1996
页数:4
相关论文
共 50 条
  • [21] Improved MFCC feature extraction by PCA-optimized filterbank for speech recognition
    Lee, SM
    Fang, SH
    Hung, JW
    Lee, LS
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 49 - 52
  • [22] Unalike Methodologies of Feature Extraction & Feature Matching in Speech Recognition
    Tripathy, Ruchismita
    Tripathy, Hrudaya Kumar
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [23] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [24] Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems
    Alizadeh, S.
    Boostani, R.
    Asadpour, V.
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 561 - +
  • [25] COMPARISON BETWEEN DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    Wiggers, Pascal
    Wojdel, Jacek C.
    JOURNAL ON MULTIMODAL USER INTERFACES, 2007, 1 (01) : 7 - 20
  • [26] Comparison between different feature extraction techniques for audio-visual speech recognition
    Alin G. Chiţu
    Leon J. M. Rothkrantz
    Pascal Wiggers
    Jacek C. Wojdel
    Journal on Multimodal User Interfaces, 2007, 1 : 7 - 20
  • [27] A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    SPEECH COMMUNICATION, 2017, 90 : 26 - 38
  • [28] Acceleration of feature extraction for FPGA based speech recognition
    Arminas, Vytautas
    Tamulevicius, Gintautas
    Navakauskas, Dalius
    Ivanovas, Edgaras
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2010, 2010, 7745
  • [29] Speech recognition with emphasis on wavelet based feature extraction
    Farooq, O
    Datta, S
    IETE JOURNAL OF RESEARCH, 2002, 48 (01) : 3 - 13
  • [30] Feature Extraction and Modeling Techniques in Speech Recognition: A Review
    Khan, Usman
    Sarim, Muhammad
    Bin Ahmad, Maaz
    Shafiq, Farhan
    2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS ENGINEERING (ICISE 2019), 2019, : 63 - 67