Two-Level Bimodal Association for Audio-Visual Speech Recognition

被引：0

作者：

Lee, Jong-Seok ^{[1
]}

Ebrahimi, Touradj ^{[1
]}

机构：

[1] Ecole Polytech Fed Lausanne, Multimedia Signal Proc Grp, CH-1015 Lausanne, Switzerland

来源：

ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS | 2009年 / 5807卷

关键词：

SYNCHRONIZATION; FUSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

引用

页码：133 / 144

页数：12

共 50 条

[31] Weighting schemes for audio-visual fusion in speech recognition
Glotin, H
Vergyri, D
Neti, C
Potamianos, G
Luettin, J
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 173 - 176
[32] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
Ara V. Nefian
Luhong Liang
Xiaobo Pi
Xiaoxing Liu
Kevin Murphy
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[33] Connectionism based audio-visual speech recognition method
Che, Na
Zhu, Yi-Ming
Zhao, Jian
Sun, Lei
Shi, Li-Juan
Zeng, Xian-Wei
[J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (10): : 2984 - 2993
[34] Research on Robust Audio-Visual Speech Recognition Algorithms
Yang, Wenfeng
Li, Pengyi
Yang, Wei
Liu, Yuxing
He, Yulong
Petrosian, Ovanes
Davydenko, Aleksandr
[J]. MATHEMATICS, 2023, 11 (07)
[35] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
Estellers, Virginia
Gurban, Mihai
Thiran, Jean-Philippe
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
[36] Towards practical deployment of audio-visual speech recognition
Potamianos, G
Neti, C
Huang, J
Connell, JH
Chu, S
Libal, V
Marcheret, E
Haas, N
Jiang, J
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 777 - 780
[37] Audio-visual speech recognition using deep learning
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
[J]. APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
[38] An audio-visual corpus for multimodal automatic speech recognition
Andrzej Czyzewski
Bozena Kostek
Piotr Bratoszewski
Jozef Kotus
Marcin Szykulski
[J]. Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
[39] DAVIS: Driver's Audio-Visual Speech Recognition
Ivanko, Denis
Ryumin, Dmitry
Kashevnik, Alexey
Axyonov, Alexandr
Kitenko, Andrey
Lashkov, Igor
Karpov, Alexey
[J]. INTERSPEECH 2022, 2022, : 1141 - 1142
[40] Audio-Visual Efficient Conformer for Robust Speech Recognition
Burchi, Maxime
Timofte, Radu
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2257 - 2266

← 1 2 3 4 5 →