Large vocabulary audio-visual speech recognition using the Janus speech recognition toolkit

被引:0
|
作者
Kratt, J [1 ]
Metze, F [1 ]
Stiefelhagen, R [1 ]
Waibel, A [1 ]
机构
[1] Univ Karlsruhe, Interact Syst Labs, Karlsruhe, Germany
来源
PATTERN RECOGNITION | 2004年 / 3175卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experiments.
引用
收藏
页码:488 / 495
页数:8
相关论文
共 50 条
  • [1] Large Vocabulary Continuous Audio-Visual Speech Recognition
    Sterpu, George
    [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 538 - 541
  • [2] Large vocabulary audio-visual speech recognition using active shape models
    Faruquie, TA
    Majumdar, A
    Rajput, N
    Subramaniam, LV
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 106 - 109
  • [3] Asynchronous stream modeling for large vocabulary audio-visual speech recognition
    Luettin, J
    Potamianos, G
    Neti, C
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 169 - 172
  • [4] Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition
    Yu, Wentao
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 341 - 345
  • [5] Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
    Yu, Wentao
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [6] Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition
    Yu, Wentao
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. SENSORS, 2022, 22 (15)
  • [7] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [8] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [9] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [10] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    [J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072