Multi-stream product modal audio-visual integration strategy for robust adaptive speech recognition

被引：0

作者：

Gurbuz, S ^{[1
]}

Tufekci, Z ^{[1
]}

Patterson, E ^{[1
]}

Gowdy, JN ^{[1
]}

机构：

[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA

来源：

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we extend an existing audio-only automatic speech recognizer to implement a multi-stream audio-visual automatic speech recognition (AV-ASR) system. Our method forms a multi-stream feature vector from audio-visual speech data, computes the statistical modal parameters probabilities on the basis of multi-stream audio-visual features, and performs dynamic programming jointly on the multi-stream product modal Hidden Markov Models (MSPM-HMMs) by utilizing a noise type and signal-to-noise ratio (SNR) based stream-weighting value. Experimental results are presented for an isolated word recognition task for eight different noise types from the NOISEX data base for several SNR values. The proposed system reduces the word error rate (WER), averaged over several SNR and noise types, from 55.9% with the audio-only recognizer and 7.9% with the late-integration audio-visual recognizer to 2.6% WER in the validation set.

引用

页码：2021 / 2024

页数：4

共 50 条

[1] Multi-stream asynchrony modeling for audio-visual speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Hou, Yunshu
[J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
[2] DBN based multi-stream models for audio-visual speech recognition
Gowdy, JN
Subramanya, A
Bartels, C
Bilmes, J
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 993 - 996
[3] Fused HMM-Adaptation of Multi-Stream HMMs for Audio-Visual Speech Recognition
Dean, David
Lucey, Patrick
Sridharan, Sridha
Wark, Tim
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2272 - 2275
[4] Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
Garg, A
Potamianos, G
Neti, C
Huang, TS
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 24 - 27
[5] Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
Garg, A
Potamianos, G
Neti, C
Huang, TS
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 605 - 608
[6] Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition
Nakamura, S
Kumatani, K
Tamura, S
[J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 305 - 309
[7] Multi-stream confidence analysis for audio-visual affect recognition
Zeng, ZH
Tu, JL
Liu, M
Huang, TS
[J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 964 - 971
[8] A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMS
Tamura, S
Iwano, K
Furui, S
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 857 - 860
[9] Audio-visual Integration for Robust Speech Recognition Using Maximum Weighted Stream Posteriors
Seymour, Rowan
Stewart, Darryl
Ming, Ji
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 869 - 872
[10] Combined Discriminative Training for Multi-Stream HMM-based Audio-Visual Speech Recognition
Huang, Jing
Visweswariah, Karthik
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1399 - +

← 1 2 3 4 5 →