Robust endpoint detection and energy normalization for real-time speech and speaker recognition

被引:130
|
作者
Li, Q [1 ]
Zheng, JS [1 ]
Tsai, A [1 ]
Zhou, QR [1 ]
机构
[1] Lucent Technol, Bell Labs, Multimedia Commun Res Lab, Murray Hill, NJ 07974 USA
来源
关键词
change-point detection; edge detection; endpoint detection; optimal filter; robust speech recognition; speaker verification; speech activity detection; speech detection;
D O I
10.1109/TSA.2002.1001979
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When automatic speech recognition (ASR) and speaker verification (SV) are applied in adverse acoustic environments, endpoint detection and energy normalization can be crucial to the functioning of both systems. In low signal-to-noise ratio (SNR) and nonstationary environments, conventional approaches to endpoint detection and energy normalization often fail and ASR performances usually degrade dramatically. The purpose of this paper is to address the endpoint problem. For ASR, we propose a real-time approach. It uses an optimal filter plus a three-state transition diagram for endpoint detection. The filter is designed utilizing several criteria to ensure accuracy and robustness. It has almost invariant response at various background noise levels. The detected endpoints are then applied to energy normalization sequentially. Evaluation results show that the proposed algorithm significantly reduces the string error rates in low SNR situations. The reduction rates even exceed 50% in several evaluated databases. For SV, we propose a batch-mode approach. It uses the optimal filter plus a two-mixture energy model for endpoint detection. The experiments show that the batch-mode algorithm can detect endpoints as accurately as using HMM forced alignment while the proposed one has much less computational complexity.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [1] A robust, real-time endpoint detector with energy normalization for ASR in adverse environments
    Li, Q
    Zheng, JS
    Zhou, QR
    Lee, CH
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 233 - 236
  • [2] A robust real-time endpoint detection algorithm
    Zhang, Y
    Elison, J
    Yfantis, EA
    [J]. PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 2000, : 58 - 63
  • [3] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [4] Real-Time Speaker Adaptation for Speech Recognition on Mobile Devices
    Lee, Gil Ho
    [J]. 2010 7TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE-CCNC 2010, 2010, : 403 - 404
  • [5] Robust end-of-utterance detection for real-time speech recognition applications
    Hariharan, R
    Häkkinen, J
    Laurila, K
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 249 - 252
  • [6] On real-time mean-and-variance normalization of speech recognition features
    Pujol, Pere
    Macho, Dusan
    Nadeu, Climent
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 773 - 776
  • [7] A robust algorithm for real-time endpoint detection in the noisy mobile environments
    Wu, B
    Ren, XL
    Liu, CQ
    Zhang, YX
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (04) : 579 - 582
  • [8] A ROBUST AND REAL-TIME VISUAL SPEECH RECOGNITION FOR SMARTPHONE APPLICATION
    Song, Min Gyu
    Tariquzzamani, Md
    Kim, Jin Young
    Hwang, Seong Taek
    Chi, Seung Ho
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (04): : 2837 - 2853
  • [9] REAL-TIME SPEECH RECOGNITION
    CAELEN, J
    CASTAN, S
    PERENNOU, G
    [J]. AUTOMATISME, 1972, 17 (03): : 87 - &
  • [10] REAL-TIME, UNIVERSAL, AND ROBUST ADVERSARIAL ATTACKS AGAINST SPEAKER RECOGNITION SYSTEMS
    Xie, Yi
    Shi, Cong
    Lie, Zhuohang
    Liu, Jian
    Chen, Yingying
    Yuan, Bo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1738 - 1742