Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition

被引：2

作者：

Pao, Tsang-Long ^{[1
]}

Liao, Wen-Yuan ^{[1
]}

Wu, Tsan-Nung ^{[1
]}

Lin, Ching-Yi ^{[1
]}

机构：

[1] Tatung Univ, Dept Comp Sci & Engn, Taipei 104, Taiwan

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9 | 2009年

关键词：

audio-visual speech recognition; audio speech feature; visual speech feature; WD-KNN classifier;

D O I：

10.1109/ICSMC.2009.5346011

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic speech recognition (ASR) by machine has been an attractive research area in past several decades. In recent years, there are many automatic speech-reading systems proposed that utilizing the combination of audio and visual speech features. In this paper, we proposed an automatic visual feature extraction approach to extract the visual features of the lips that can be used in the audio-visual speech recognition system. These features are important to the recognition system, especially in noisy condition. The segmentation of the lip region uses both color and edge information. We then establish a set of visual speech parameters and incorporate them into the recognizer. The WD-KNN classifier is used as the recognition engine in this paper. We present recognition performance using various visual features to explore their impact on the recognition accuracy. These features include the geometric and the motion of the lip. The experimental results based on Mandarin databases demonstrate that the visual information is highly effective for improving the recognition performance.

引用

页码：2936 / 2940

页数：5

共 50 条

[1] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
[J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[2] A HYBRID VISUAL FEATURE EXTRACTION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION
Wu, Guanyong
Zhu, Jie
Xu, Haihua
[J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1829 - 1832
[3] Information Theoretic Feature Extraction for Audio-Visual Speech Recognition
Gurban, Mihai
Thiran, Jean-Philippe
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (12) : 4765 - 4776
[4] Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition
Su, Rongfeng
Liu, Xunying
Wang, Lan
Yang, Jingzhou
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 185 - 197
[5] Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition
Debnath, Saswati
Roy, Pinki
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (01) : 25 - 32
[6] A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition
Sui, Chao
Togneri, Roberto
Bennamoun, Mohammed
[J]. SPEECH COMMUNICATION, 2017, 90 : 26 - 38
[7] MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION
Pao, Tsang-Long
Liao, Wen-Yuan
Chen, Yu-Te
Wu, Tsan-Nung
[J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 711 - 723
[8] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition
Lv, Guoyun
Zhao, Rongchun
Jiang, Dongmei
Li, Yan
Sahli, H.
[J]. 2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
[9] Relevant feature selection for audio-visual speech recognition
Drugman, Thomas
Gurban, Mihai
Thiran, Jean-Philippe
[J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
[10] Audio-Visual Automatic Speech Recognition for Connected Digits
Wang, Xiaoping
Hao, Yufeng
Fu, Degang
Yuan, Chunwei
[J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +

← 1 2 3 4 5 →