Visual speech feature extraction for improved speech recognition

被引:0
|
作者
Zhang, X [1 ]
Mersereau, RM [1 ]
Clements, M [1 ]
Broun, CC [1 ]
机构
[1] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
引用
收藏
页码:1993 / 1996
页数:4
相关论文
共 50 条
  • [41] Soft Margin Feature Extraction for Automatic Speech Recognition
    Li, Jinyu
    Lee, Chin-Hui
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 293 - 296
  • [42] A Salient Feature Extraction Algorithm for Speech Emotion Recognition
    Liang, Ruiyu
    Tao, Huawei
    Tang, Guichen
    Wang, Qingyun
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (09): : 1715 - 1718
  • [43] APPLYING FEATURE EXTRACTION OF SPEECH RECOGNITION ON VOIP AUDITING
    Wang, Xuan
    Lin, Jiancheng
    Sun, Yong
    Gan, Haibo
    Yao, Lin
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (07): : 1851 - 1856
  • [44] On the use of kernel PCA for feature extraction in speech recognition
    Lima, A
    Zen, H
    Nankaku, Y
    Miyajima, C
    Tokuda, K
    Kitamura, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (12) : 2802 - 2811
  • [45] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition
    Lv, Guoyun
    Zhao, Rongchun
    Jiang, Dongmei
    Li, Yan
    Sahli, H.
    2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
  • [46] Shape Feature Analysis for Visual Speech and Speaker Recognition
    Gui, Jiaping
    Wang, Shilin
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 81 - 84
  • [47] Shape Feature Analysis for Visual Speech and Speaker Recognition
    Gui, Jiaping
    Wang, Shilin
    APPLIED INFORMATICS AND COMMUNICATION, PT III, 2011, 226 : 167 - 174
  • [48] Visual speech recognition with loosely synchronized feature streams
    Saenko, K
    Livescu, K
    Siracusa, M
    Wilson, K
    Glass, J
    Darrell, T
    TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1424 - 1431
  • [49] Survey on Acoustic Modeling and Feature Extraction for Speech Recognition
    Garg, Anjali
    Sharma, Poonam
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2291 - 2295
  • [50] Distinctive phonetic feature extraction for robust speech recognition
    Fukuda, T
    Yamamoto, W
    Nitta, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 25 - 28