Speaker independent audio-visual speech recognition

被引:0
|
作者
Zhang, Y [1 ]
Levinson, S [1 ]
Huang, T [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a general framework of integrating multimodal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal GO-occurrence are taken into account. We discuss various data fusion strategies, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.
引用
收藏
页码:1073 / 1076
页数:4
相关论文
共 50 条
  • [1] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [2] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
  • [3] Audio-Visual Multilevel Fusion for Speech and Speaker Recognition
    Chetty, Girija
    Wagner, Michael
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 379 - 382
  • [4] An audio-visual approach to simultaneous-speaker speech recognition
    Patterson, EK
    Gowdy, JN
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 780 - 783
  • [5] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
    Fecher, Natalie
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250
  • [6] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [7] DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation
    Gogate, Mandar
    Adeel, Ahsan
    Marxer, Ricard
    Barker, Jon
    Hussain, Amir
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2723 - 2727
  • [8] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [9] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    [J]. LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [10] Audio-visual system for robust speaker recognition
    Chen, Q
    Yang, JG
    Gou, J
    [J]. MLMTA '05: Proceedings of the International Conference on Machine Learning Models Technologies and Applications, 2005, : 97 - 103