Human Audio-Visual Consonant Recognition Analyzed with Three Bimodal Integration Models

被引:0
|
作者
Ma, Zhanyu [1 ]
Leijon, Arne [1 ]
机构
[1] KTH Royal Inst Technol, Sound & Image Proc Lab, Stockholm, Sweden
关键词
Audio-visual recognition; Fuzzy Logical Model of Perception; Post-Labelling Model; Hidden Markov Models; Multi-Stream Hidden Markov Models; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With A-V recordings. ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.
引用
收藏
页码:820 / 823
页数:4
相关论文
共 50 条
  • [21] Does attention influence audio-visual neural interactions during bimodal object recognition?
    Fort, A
    Giard-Steiner, MH
    [J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 2002, : 68 - 68
  • [22] Audio-visual integration and saccadic inhibition
    Makovac, Elena
    Buonocore, Antimo
    McIntosh, Robert D.
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2015, 68 (07): : 1295 - 1305
  • [23] Audio-visual spatial alignment improves integration in the presence of a competing audio-visual stimulus
    Fleming, Justin T.
    Noyce, Abigail L.
    Shinn-Cunningham, Barbara G.
    [J]. NEUROPSYCHOLOGIA, 2020, 146
  • [24] Fusion and combination in audio-visual integration
    Omata, Kei
    Mogi, Ken
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2008, 464 (2090): : 319 - 340
  • [25] Audio-visual integration in multimodal communication
    Chen, T
    Rao, RR
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
  • [26] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [27] Audio-visual integration in temporal perception
    Wada, Y
    Kitagawa, N
    Noguchi, K
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2003, 50 (1-2) : 117 - 124
  • [28] Audio-visual integration of emotion expression
    Collignon, Olivier
    Girard, Simon
    Gosselin, Frederic
    Roy, Sylvain
    Saint-Amour, Dave
    Lassonde, Maryse
    Lepore, Franco
    [J]. BRAIN RESEARCH, 2008, 1242 : 126 - 135
  • [29] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [30] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    [J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +