An evaluation of adaptive beamformer based on average speech spectrum for noisy speech recognition

被引:0
|
作者
Nishiura, T [1 ]
Nakayama, M [1 ]
Nakamura, S [1 ]
机构
[1] ATR, Spoken Language Translat Res Labs, Kyoto 6190288, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distant-talking speech recognition in noisy environments is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound-capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (Adaptive Microphone-array for NOise Reduction) was proposed as an adaptive beamformer for capturing the desired distant signals in noisy environments by Kaneda et al. Although the AMNOR has been proven effective, it can be further improved if we know the spectrum characteristics of the desired distant signals in advance. Therefore, we regarded speech as a desired distant signal and designed an AMNOR based on the average speech spectrum. In this paper, we particularly focused on the performance of AMNOR based on the average speech spectrum for distant-talking speech capture and recognition. As a result of evaluation experiments in real acoustic environments, we confirmed that the ASR (Automatic Speech Recognition) performance was improved 5 - 10% by using an AMNOR based on the average speech spectrum in noisy environments. In addition, the proposed AMNOR provides better noise reduction performance than that of conventional AMNOR.
引用
收藏
页码:209 / 212
页数:4
相关论文
共 50 条
  • [31] SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY
    GONG, YF
    SPEECH COMMUNICATION, 1995, 16 (03) : 261 - 291
  • [32] Speech emotion recognition in noisy environment
    Chenchah, Farah
    Lachiri, Zied
    2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 788 - 792
  • [33] Study of speech recognition in noisy environment
    Kreisinger, T
    Pollak, P
    Sovka, P
    Uhlir, J
    SIGNAL ANALYSIS & PREDICTION I, 1997, : 334 - 337
  • [34] SPEECH RECOGNITION IN THE NOISY CAR ENVIRONMENT
    RUEHL, HW
    DOBLER, S
    WEITH, J
    MEYER, P
    NOLL, A
    HAMER, HH
    PIOTROWSKI, H
    SPEECH COMMUNICATION, 1991, 10 (01) : 11 - 22
  • [35] Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis
    Saito, Tatsuhiko
    Nose, Takashi
    Kobayashi, Takao
    Okato, Yohei
    Horii, Akio
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1964 - +
  • [36] Cepstrum-domain model combination based on decomposition of speech and noise for noisy speech recognition
    Kim, HK
    Rose, RC
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 209 - 212
  • [37] Word graph based feature enhancement for noisy speech recognition
    Yan, Zhi-Jie
    Soong, Frank K.
    Wang, Ren-Hua
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 373 - +
  • [38] An effective cluster-based model for robust speech detection and speech recognition in noisy environments
    Gorriz, J. M.
    Ramirez, J.
    Segura, J. C.
    Puntonet, C. G.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01): : 470 - 481
  • [39] A noisy speech recognition method based on singular value decomposition
    Xu, J.
    Wei, G.
    Leung, S.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2001, 29 (01): : 91 - 93
  • [40] HCRF-based Model Compensation for Noisy Speech Recognition
    Hong, Wei-Tyng
    2013 IEEE 17TH INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE), 2013, : 277 - 278