A noise-robust speech input interface for information kiosk terminals

被引:1
|
作者
Ida, M [1 ]
Mori, H
Nakamura, S
Shikano, K
机构
[1] OMRON Co, Informat Technol Res Ctr, Kyoto 6008530, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma 6300101, Japan
关键词
speech recognition; noise reduction; microphone array; spectral subtraction;
D O I
10.1002/ecjb.20135
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In social amenity apparatus such as information terminals or oil-line shopping terminals and ticket vending machines, more natural and smoother communication between machines and humans is made possible by incorporating speech recognition, thus rapidly enhancing operation and convenience. In this paper, we will describe the trial production of an information kiosk terminal equipped with speech recognition and the noise reduction technique adopted as a speech input interface. In general, the aforementioned system equipment is installed in public spaces such as stations. Therefore, robust speech recognition performance is required even in the worst noise environments. For the realization of this antinoise capability, we have used noise reduction based on both a microphone array and spectral subtraction (SS). Experiments with evaluation data recorded in real environments indicate that a recognition performance of over 90% in isolated-word recognition of 216 words uttered by all unspecified speaker is achieved by the combined use of a 32-channel microphone array and SS. (C) 2004 Wiley Periodicals, Inc.
引用
收藏
页码:51 / 61
页数:11
相关论文
共 50 条
  • [1] Noise-robust speech triage
    Bartos, Anthony L.
    Cipr, Tomas
    Nelson, Douglas J.
    Schwarz, Petr
    Banowetz, John
    Jerabek, Ladislav
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (04): : 2313 - 2320
  • [2] Probabilistic vector mapping with trajectory information for noise-robust speech recognition
    Kim, DY
    Un, CK
    [J]. ELECTRONICS LETTERS, 1996, 32 (17) : 1550 - 1551
  • [3] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [4] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [5] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
  • [6] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [7] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [8] Frame decorrelation for noise-robust speech recognition
    Jung, HY
    Kim, DY
    Un, CK
    [J]. ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164
  • [9] Frame decorrelation for noise-robust speech recognition
    Korea Advanced Inst of Science and, Technology, Taejon, Korea, Republic of
    [J]. Electron Lett, 13 (1163-1164):
  • [10] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743