Human Audio-Visual Consonant Recognition Analyzed with Three Bimodal Integration Models

被引:0
|
作者
Ma, Zhanyu [1 ]
Leijon, Arne [1 ]
机构
[1] KTH Royal Inst Technol, Sound & Image Proc Lab, Stockholm, Sweden
关键词
Audio-visual recognition; Fuzzy Logical Model of Perception; Post-Labelling Model; Hidden Markov Models; Multi-Stream Hidden Markov Models; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With A-V recordings. ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.
引用
收藏
页码:820 / 823
页数:4
相关论文
共 50 条
  • [1] Audio-visual modeling for bimodal speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Chung, KC
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
  • [2] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [3] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    [J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [4] Two-Level Bimodal Association for Audio-Visual Speech Recognition
    Lee, Jong-Seok
    Ebrahimi, Touradj
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2009, 5807 : 133 - 144
  • [5] Bimodality Streams Integration for Audio-Visual Speech Recognition Systems
    Seman, Noraini
    Roslan, Rosniza
    Jamil, Nursuriati
    Ardi, Norizah
    [J]. HYBRID INTELLIGENT SYSTEMS, HIS 2015, 2016, 420 : 127 - 139
  • [6] Integration of Deep Bottleneck Features for Audio-Visual Speech Recognition
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Tamura, Satoshi
    Iribe, Yurie
    Takeda, Kazuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 563 - 567
  • [7] Robust audio-visual speech recognition based on late integration
    Lee, Jong-Seok
    Park, Cheol Hoon
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (05) : 767 - 779
  • [8] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [9] Audio-visual integration in schizophrenia
    de Gelder, B
    Vroomen, J
    Annen, L
    Masthof, E
    Hodiamont, P
    [J]. SCHIZOPHRENIA RESEARCH, 2003, 59 (2-3) : 211 - 218
  • [10] Learning Bimodal Structure in Audio-Visual Data
    Monaci, Gianluca
    Vandergheynst, Pierre
    Sommer, Friedrich T.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (12): : 1898 - 1910