Speaker independent audio-visual speech recognition

被引：0

作者：

Zhang, Y ^{[1
]}

Levinson, S ^{[1
]}

Huang, T ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA

来源：

2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III | 2000年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a general framework of integrating multimodal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal GO-occurrence are taken into account. We discuss various data fusion strategies, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.

引用

页码：1073 / 1076

页数：4

共 50 条

[31] An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement
Sun, Zhongbo
Wang, Yannan
Cao, Li
[J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 722 - 728
[32] Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Chao, Guan-Lin
Chan, William
Lane, Ian
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2120 - 2124
[33] Integration strategies for audio-visual speech processing: Applied to text-dependent speaker recognition
Lucey, S
Chen, TH
Sridharan, S
Chandran, V
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) : 495 - 506
[34] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
Braga, Otavio
Siohan, Olivier
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
[35] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
[J]. DIGITAL SIGNAL PROCESSING, 2024, 145
[36] Audio-visual speech recognition using lstm and cnn
El Maghraby, Eslam E.
Gody, Amr M.
Farouk, M. Hesham
[J]. Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
[37] Building a data corpus for audio-visual speech recognition
Chitu, Alin G.
Rothkrantz, Leon J. M.
[J]. EUROMEDIA '2007, 2007, : 88 - 92
[38] Audio-visual fuzzy fusion for robust speech recognition
Malcangi, M.
Ouazzane, K.
Patel, P.
[J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[39] DARE: Deceiving Audio-Visual speech Recognition model
Mishra, Saumya
Gupta, Anup Kumar
Gupta, Puneet
[J]. KNOWLEDGE-BASED SYSTEMS, 2021, 232
[40] Audio-Visual Automatic Speech Recognition for Connected Digits
Wang, Xiaoping
Hao, Yufeng
Fu, Degang
Yuan, Chunwei
[J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +

← 1 2 3 4 5 →