USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION

被引：0

作者：

Topkaya, Ibrahim Saygin ^{[1
]}

Erdogan, Hakan ^{[1
]}

机构：

[1] Sabanci Univ, Fac Engn & Nat Sci, Vis & Pattern Anal Lab, Istanbul, Turkey

来源：

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年

关键词：

Audio-Visual Speech Recognition; Hidden Markov Models; Tandem Approach; Support Vector Machines; Neural Networks; Coupled Hidden Markov Models;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model.

引用

下载

页码：4988 / 4991

页数：4

共 50 条

[1] Bimodality Streams Integration for Audio-Visual Speech Recognition Systems
Seman, Noraini
Roslan, Rosniza
Jamil, Nursuriati
Ardi, Norizah
HYBRID INTELLIGENT SYSTEMS, HIS 2015, 2016, 420 : 127 - 139
[2] Fusing data streams in continuous audio-visual speech recognition
Rothkrantz, LJM
Wojdel, JC
Wiggers, P
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 33 - 44
[3] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[4] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[5] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[6] Audio-visual speech recognition using MPEGA compliant visual features
Aleksic, PS
Williams, JJ
Wu, ZL
Katsaggelos, AK
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1213 - 1227
[7] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[8] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[9] Audio-visual integration for speech recognition
Kober, R
Harz, U
NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[10] An audio-visual speech recognition system for testing new audio-visual databases
Pao, Tsang-Long
Liao, Wen-Yuan
VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +

← 1 2 3 4 5 →