Multi-Stream Asynchrony Dynamic Bayesian Network model for audio-visual continuous speech recognition

被引：0

作者：

Lv, Guoyun ^{[1
]}

Jiang, Dongmei ^{[1
,2
]}

Zhao, Rongchun ^{[1
]}

Jiang, Xiaoyue ^{[1
]}

Sahli, H. ^{[2
]}

机构：

[1] Nouthwest Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China

[2] Vrije Univ Brussel, Dept ETRO, B-1050 Brussels, Belgium

来源：

2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES | 2007年

关键词：

Dynamic Bayesian Networks; Bayesian Tangent Shape Model; audio-visual; speech recognition;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

How best to describe the asynchrony of the speech and lip motion is a key problem of audio-visual speech recognition model. A Multi-Stream Asynchrony Dynamic Bayesian Network (MS-ADBN) model is brought forward for audio-visual speech recognition, and in this model, audio stream and visual stream are synchronous in word node, while between the word nodes, each stream has its own independent phone, phone transition and observation vector node, and word transition probability is determined by audio stream and visual stream together. For each stream, each word is composed of its corresponding phones, and each phone is associated with observation feature (audio feature for audio stream and visual feature for visual stream), with some probability modeled by Gaussian mixed model. Compare with general multi-stream HMM, MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. The experiment results on continuous digit audio visual database show that: compare with multi-stream HMM, in the mismatch noise environment, an average improvement of 10.07% are obtained for MS-ADBN model.

引用

页码：170 / +

页数：2

共 50 条

[1] Multi-stream asynchrony modeling for audio-visual speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Hou, Yunshu
[J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
[2] DBN based multi-stream models for audio-visual speech recognition
Gowdy, JN
Subramanya, A
Bartels, C
Bilmes, J
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 993 - 996
[3] Improved Decision Trees for Multi-stream HMM-based Audio-Visual Continuous Speech Recognition
Huang, Jing
Visweswariah, Karthik
[J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 228 - +
[4] Fused HMM-Adaptation of Multi-Stream HMMs for Audio-Visual Speech Recognition
Dean, David
Lucey, Patrick
Sridharan, Sridha
Wark, Tim
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2272 - 2275
[5] Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
Garg, A
Potamianos, G
Neti, C
Huang, TS
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 24 - 27
[6] Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
Garg, A
Potamianos, G
Neti, C
Huang, TS
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 605 - 608
[7] Multi-stream confidence analysis for audio-visual affect recognition
Zeng, ZH
Tu, JL
Liu, M
Huang, TS
[J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 964 - 971
[8] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
Estellers, Virginia
Gurban, Mihai
Thiran, Jean-Philippe
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
[9] A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMS
Tamura, S
Iwano, K
Furui, S
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 857 - 860
[10] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
Ara V. Nefian
Luhong Liang
Xiaobo Pi
Xiaoxing Liu
Kevin Murphy
[J]. EURASIP Journal on Advances in Signal Processing, 2002

← 1 2 3 4 5 →