Experimenting with lipreading for large vocabulary continuous speech recognition

被引:4
|
作者
Palecek, Karel [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic
关键词
Audiovisual speech recognition; Lipreading; LVCSR; AUDIOVISUAL SPEECH;
D O I
10.1007/s12193-018-0266-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual and also depth information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parametrizations, both existing and novel, that are designed to capture different kind of information in the video and depth signals. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. Both the video and depth data was captured by the Microsoft Kinect device. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 22% relatively to the acoustic-only recognition. Somewhat surprisingly, a relative improvement of up to 16% has also been reached using the interpolated depth data.
引用
收藏
页码:309 / 318
页数:10
相关论文
共 50 条
  • [41] Integrating Stress Information in Large Vocabulary Continuous Speech Recognition
    Ludusan, Bogdan
    Ziegler, Stefan
    Gravier, Guillaume
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2641 - 2644
  • [42] IMPROVEMENTS ON BOTTLENECK FEATURE FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Tuerxun, Maimaitiaili
    Zhang, Shiliang
    Bao, Yebo
    Dai, Lirong
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 516 - 520
  • [43] A LAYERED APPROACH FOR DUTCH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Pelemans, Joris
    Demuynck, Kris
    Wambacq, Patrick
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4421 - 4424
  • [44] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
    Itou, Katunobu
    Yamamoto, Mikio
    Takeda, Kazuya
    Takezawa, Toshiyuki
    Matsuoka, Tatsuo
    Kobayashi, Tetsunori
    Shikano, Kiyohiro
    Itahashi, Shuichi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
  • [45] Visual information assisted mandarin large vocabulary continuous speech recognition
    Liu, P
    Wang, ZY
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 72 - 77
  • [46] An efficient search space representation for large vocabulary continuous speech recognition
    Demuynck, K
    Duchateau, J
    Van Compernolle, D
    Wambacq, P
    SPEECH COMMUNICATION, 2000, 30 (01) : 37 - 53
  • [47] Integrating induced probability into decoding for large vocabulary continuous speech recognition
    Yang, Zhanlei
    Liu, Wenju
    Chao, Hao
    Shengxue Xuebao/Acta Acustica, 2012, 37 (02): : 209 - 217
  • [48] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
    Thelen, E
    Aubert, X
    Beyerlein, P
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
  • [49] ARTICULATORY INFORMATION AND MULTIVIEW FEATURES FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Mitra, Vikramjit
    Wang, Wen
    Bartels, Chris
    Franco, Horacio
    Vergyri, Dimitra
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5634 - 5638
  • [50] Combining spectral representations for large-vocabulary continuous speech recognition
    Garau, Giulia
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 508 - 518