LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES

被引:0
|
作者
Sun, Felix [1 ]
Harwath, David [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
Multimodal speech recognition; image captioning; CNN; lattices;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN. This rescoring mechanism decreases the word error rate by 3 absolute percentage points, compared to a baseline speech recognizer operating with only the speech recording.
引用
收藏
页码:573 / 578
页数:6
相关论文
共 50 条
  • [31] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [32] Stop, look, listen
    Kirsty Minton
    Nature Reviews Immunology, 2009, 9 (9) : 606 - 606
  • [33] Just listen! Look there!
    Segebrecht, Wulf
    AKZENTE-ZEITSCHRIFT FUR LITERATUR, 2010, 57 (03): : 230 - 235
  • [34] Stop, look, listen
    Dammann, Guy
    TLS-THE TIMES LITERARY SUPPLEMENT, 2013, (5774): : 21 - 21
  • [35] Look, Listen and Infer
    Jia, Ruijian
    Wang, Xinsheng
    Pang, Shanmin
    Zhu, Jihua
    Xue, Jianru
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3911 - 3919
  • [36] STOP, LOOK AND LISTEN
    HORNER, HH
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 1949, 39 (02): : 136 - 142
  • [37] LOOK, LISTEN, FEEL
    THOMPSON, DR
    HYDRAULICS & PNEUMATICS, 1971, 24 (06) : 10 - &
  • [38] STOP - LOOK - LISTEN
    BLACK, BJ
    PSYCHIATRIC QUARTERLY, 1974, 48 (02) : 295 - 297
  • [39] STOP - LOOK - LISTEN
    GREEN, BE
    SOUTHERN MEDICAL JOURNAL, 1976, 69 (07) : 823 - 823
  • [40] STOP, LOOK, LISTEN
    不详
    BRITISH MEDICAL JOURNAL, 1992, 305 (6857): : 838 - 838