Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition

被引:2
|
作者
Pao, Tsang-Long [1 ]
Liao, Wen-Yuan [1 ]
Wu, Tsan-Nung [1 ]
Lin, Ching-Yi [1 ]
机构
[1] Tatung Univ, Dept Comp Sci & Engn, Taipei 104, Taiwan
关键词
audio-visual speech recognition; audio speech feature; visual speech feature; WD-KNN classifier;
D O I
10.1109/ICSMC.2009.5346011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) by machine has been an attractive research area in past several decades. In recent years, there are many automatic speech-reading systems proposed that utilizing the combination of audio and visual speech features. In this paper, we proposed an automatic visual feature extraction approach to extract the visual features of the lips that can be used in the audio-visual speech recognition system. These features are important to the recognition system, especially in noisy condition. The segmentation of the lip region uses both color and edge information. We then establish a set of visual speech parameters and incorporate them into the recognizer. The WD-KNN classifier is used as the recognition engine in this paper. We present recognition performance using various visual features to explore their impact on the recognition accuracy. These features include the geometric and the motion of the lip. The experimental results based on Mandarin databases demonstrate that the visual information is highly effective for improving the recognition performance.
引用
收藏
页码:2936 / 2940
页数:5
相关论文
共 50 条
  • [1] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [2] A HYBRID VISUAL FEATURE EXTRACTION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    Xu, Haihua
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1829 - 1832
  • [3] Information Theoretic Feature Extraction for Audio-Visual Speech Recognition
    Gurban, Mihai
    Thiran, Jean-Philippe
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (12) : 4765 - 4776
  • [4] Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    Yang, Jingzhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 185 - 197
  • [5] Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition
    Debnath, Saswati
    Roy, Pinki
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (01) : 25 - 32
  • [6] A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    [J]. SPEECH COMMUNICATION, 2017, 90 : 26 - 38
  • [7] MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Chen, Yu-Te
    Wu, Tsan-Nung
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 711 - 723
  • [8] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition
    Lv, Guoyun
    Zhao, Rongchun
    Jiang, Dongmei
    Li, Yan
    Sahli, H.
    [J]. 2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
  • [9] Relevant feature selection for audio-visual speech recognition
    Drugman, Thomas
    Gurban, Mihai
    Thiran, Jean-Philippe
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
  • [10] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +