Large vocabulary audio-visual speech recognition using the Janus speech recognition toolkit

被引：0

作者：

Kratt, J ^{[1
]}

Metze, F ^{[1
]}

Stiefelhagen, R ^{[1
]}

Waibel, A ^{[1
]}

机构：

[1] Univ Karlsruhe, Interact Syst Labs, Karlsruhe, Germany

来源：

PATTERN RECOGNITION | 2004年 / 3175卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experiments.

引用

页码：488 / 495

页数：8

共 50 条

[1] Large Vocabulary Continuous Audio-Visual Speech Recognition
Sterpu, George
[J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 538 - 541
[2] Large vocabulary audio-visual speech recognition using active shape models
Faruquie, TA
Majumdar, A
Rajput, N
Subramaniam, LV
[J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 106 - 109
[3] Asynchronous stream modeling for large vocabulary audio-visual speech recognition
Luettin, J
Potamianos, G
Neti, C
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 169 - 172
[4] Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition
Yu, Wentao
Zeiler, Steffen
Kolossa, Dorothea
[J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 341 - 345
[5] Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Yu, Wentao
Zeiler, Steffen
Kolossa, Dorothea
[J]. IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
[6] Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition
Yu, Wentao
Zeiler, Steffen
Kolossa, Dorothea
[J]. SENSORS, 2022, 22 (15)
[7] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
[J]. APPLIED ACOUSTICS, 2023, 211
[8] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[9] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[10] Audio-visual speech recognition by speechreading
Zhang, XZ
Mersereau, RM
Clements, MA
[J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072

← 1 2 3 4 5 →