Toward Robust Speech Recognition and Understanding

被引:0
|
作者
Sadaoki Furui
机构
[1] Tokyo Institute of Technology,Department of Computer Science
关键词
speech recognition; speech understanding; robustness; adaptation; spontaneous speech; corpus; acoustic models; language models; dialogue; multi-modal; summarization;
D O I
暂无
中图分类号
学科分类号
摘要
The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that are robust against voice variation due to individuality, the physical and psychological condition of the speaker, telephone sets, microphones, network characteristics, additive background noise, speaking styles, and other aspects. This paper overviews robust architecture and modeling techniques for speech recognition and understanding. The topics include acoustic and language modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture for spoken dialogue systems, multi-modal speech recognition, and speech summarization. This paper also discusses the most important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.
引用
收藏
页码:245 / 254
页数:9
相关论文
共 50 条
  • [1] Toward robust speech recognition and understanding
    Furui, S
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 2 - 11
  • [2] Toward robust speech recognition and understanding
    Furui, S
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2005, 41 (03): : 245 - 254
  • [3] SPEECH RECOGNITION AND UNDERSTANDING
    VINTSYUK, TK
    CYBERNETICS, 1982, 18 (05): : 657 - 669
  • [4] A robust speech analysis in speech recognition
    Miyanaga, Y
    Gozen, S
    Ohtsuki, N
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
  • [5] ESPnet-SE plus plus : Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
    Lu, Yen-Ju
    Chang, Xuankai
    Li, Chenda
    Zhang, Wangyou
    Cornell, Samuele
    Ni, Zhaoheng
    Masuyama, Yoshiki
    Yan, Brian
    Scheibler, Robin
    Wang, Zhong-Qiu
    Tsao, Yu
    Qian, Yanmin
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 5458 - 5462
  • [6] Toward Robust Mispronunciation Detection via Audio-Visual Speech Recognition
    Karbasi, Mahdie
    Zeiler, Steffen
    Freiwald, Jan
    Kolossa, Dorothea
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT II, 2019, 11507 : 655 - 666
  • [7] Speech parameters for the robust emotional speech recognition
    Kim W.-G.
    Journal of Institute of Control, Robotics and Systems, 2010, 16 (12) : 1137 - 1142
  • [8] Japanese speech databases for robust speech recognition
    Nakamura, A
    Matsunaga, S
    Shimizu, T
    Tonomura, M
    Sagisaka, Y
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2199 - 2202
  • [9] Robust recognition of fast speech
    Lee, Ki-Seung
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (08) : 2456 - 2459
  • [10] Robust speech detector for speech recognition applications
    Liang, WQ
    Chen, YN
    Shan, YX
    Liu, J
    Liu, RS
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 453 - 456