Cepstral compensation by polynomial approximation for environment-independent speech recognition

被引:0
|
作者
Raj, B
Gouvea, EB
Moreno, PJ
Stern, RM
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously attempted analytical solutions to the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or ''stereo'' recordings of clean and degraded speech. In this paper we introduce an approximation-based method to compute the effects of the environment on the parameters of the PDF of clean speech. In this work, we perform compensation by Vector Polynomial approximationS (VPS) for the effects of linear filtering and additive noise on the clean speech. We also estimate the parameters of the environment, namely the noise and the channel, by using piecewise-linear approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINX-II system and the 100-word alphanumeric CENSUS database. Performance is evaluated at several SNRs, with artificial white Gaussian noise added to the database. VPS provides improvements of up to 15 percent in relative recognition accuracy.
引用
收藏
页码:2340 / 2343
页数:4
相关论文
共 50 条
  • [31] Feature compensation based on independent noise estimation for robust speech recognition
    Lu, Yong
    Lin, Han
    Wu, Pingping
    Chen, Yitao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [32] A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning
    Awaluddin, Baiti-Ahmad
    Chao, Chun-Tang
    Chiou, Juing-Shian
    MATHEMATICS, 2024, 12 (09)
  • [33] Noisy Lombard and Loud speech compensation approach for speech recognition in extremely adverse environment
    Tian, Bin
    Yi, Kechu
    Shengxue Xuebao/Acta Acustica, 2003, 28 (01): : 28 - 32
  • [34] Cepstral gain normalization for noise robust speech recognition
    Yoshizawa, Shingo
    Hayasaka, Noboru
    Wada, Naoya
    Miyanaga, Yoshikazu
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 1600, (I209-I212):
  • [35] Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition
    Adiga, Aniruddha
    Magimai-Doss, Mathew
    Seelamantula, Chandra Sekhar
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [36] Cepstral shape normalization (CSN) for robust speech recognition
    Du, Jun
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4389 - 4392
  • [37] A comparison study of cepstral analysis with applications to speech recognition
    Zigelboim, Gabriel
    Shallom, Ilan D.
    2006 INTERNATIONAL CONFERENCE ON INFORMATION AND TECHNOLOGY: RESEARCH AND EDUCATION, 2006, : 30 - +
  • [38] A combined cepstral distance method for emotional speech recognition
    Quan, Changqin
    Zhang, Bin
    Sun, Xiao
    Ren, Fuji
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2017, 14 (04): : 1 - 9
  • [39] A DNA origami plasmonic sensor with environment-independent read-out
    Masciotti, Valentina
    Piantanida, Luca
    Naumenko, Denys
    Amenitsch, Heinz
    Fanetti, Mattia
    Valant, Matjaz
    Lei, Dongsheng
    Ren, Gang
    Lazzarino, Marco
    NANO RESEARCH, 2019, 12 (11) : 2900 - 2907
  • [40] Cepstral statistics compensation using online pseudo stereo codebooks for robust speech recognition in additive noise environments
    Hung, Jeih-weih
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 513 - 516