A Front-End Speech Enhancement System for Robust Automotive Speech Recognition

被引:0
|
作者
Wang, Haikun [1 ]
Ye, Zhongfu [1 ]
Chen, Jingdong [2 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Anhui, Peoples R China
[2] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun, Xian 710072, Shaanxi, Peoples R China
关键词
Speech enhancement; model-based; voice activity detection; microphone array; relative transfer function estimation; Generalized sidelobe cancellation; speech recognition; VOICE ACTIVITY DETECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [1] A robust front-end for telephone speech recognition
    Cho, HY
    Chi, SM
    Oh, YH
    [J]. PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
  • [2] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835
  • [3] A comparison of front-end configurations for robust speech recognition
    Milner, B
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 797 - 800
  • [4] Optimization of Speech Enhancement Front-end with Speech Recognition-level Criterion
    Higuchi, Takuya
    Yoshioka, Takuya
    Nakatani, Tomohiro
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3808 - 3812
  • [5] Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition
    Soni, Meet H.
    Joshi, Sonal
    Panda, Ashish
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 329 - 340
  • [6] Robust connected digit recognition using speech enhancement and an auditory model front-end
    Flynn, Ronan
    Jones, Edward
    [J]. 2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 410 - +
  • [7] Robust Front-End Processing For Emotion Recognition In Noisy Speech
    Pandharipande, Meghna
    Chakraborty, Rupayan
    Panda, Ashish
    Kopparapu, Sunil Kumar
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 324 - 328
  • [8] Performance evaluation of front-end algorithms for robust speech recognition
    Cheng, O
    Abdulla, W
    Salcic, Z
    [J]. ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 711 - 714
  • [9] ROBUST FRONT-END PROCESSING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
    Das, Biswajit
    Panda, Ashish
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5235 - 5239
  • [10] A Reassigned Front-End for Speech Recognition
    Tryfou, Georgina
    Omologo, Maurizio
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 553 - 557