Two-stage algorithm of spectral analysis for the automatic speech recognition systems

被引:0
|
作者
V. V. Savchenko [1 ]
L. V. Savchenko [1 ]
机构
[1] National Research University,“Higher School of Economics,”
关键词
Speech signal; Spectral analysis; Vocal tract; Autoregressive model; All-pole model; Artificial neural network; Data augmentation;
D O I
10.1007/s11018-024-02376-0
中图分类号
学科分类号
摘要
The problem of the spectral analysis of speech signals in automatic speech recognition systems is considered within the framework of a dynamically developed direction of investigations in the field of acoustic measurements. We indicate that efficiency of the analyzed systems under unfavorable conditions of speech production (noise and insufficient intelligibility of speech sounds) is low as compared with human perception of oral speech. To improve the efficiency of automatic speech recognition systems, we propose to use a two-stage algorithm of spectral analysis of the speech signals. The first stage of processing of speech signals is their parametric spectral analysis performed by using an autoregressive model of the vocal tract of a conventional speaker. The second stage of processing is the transformation (modification) of the obtained spectral estimate according to the principle of frequency-selective amplification of the amplitude of main formants of the intraperiod power spectrum. The software implementation of the proposed algorithm is described on the basis of the computational procedure of fast Fourier transform. By using the software developed by the authors, we performed full-scale experiments and studied an additive mixture of vowel sounds in the speech of a control speaker with white Gaussian noise. The obtained experimental results enable us to conclude that the amplitudes of the main formants of speech signals are amplified by 10–20 dB and, hence, the intelligibility of speech sounds substantially improves. The developed algorithm can be used in the automatic speech recognition systems based on processing of the speech signals in the frequency domain, including the use of artificial neural networks.
引用
收藏
页码:553 / 563
页数:10
相关论文
共 50 条
  • [21] Two-stage learning algorithm for biomedical named entity recognition
    Che X.-J.
    Xu H.
    Pan M.-Y.
    Liu Q.-L.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (08): : 2380 - 2387
  • [22] Automatic benthic imagery recognition using a hierarchical two-stage approach
    Tadas Rimavičius
    Adas Gelžinis
    Antanas Verikas
    Evaldas Vaičiukynas
    Marija Bačauskienė
    Aleksėj Šaškov
    Signal, Image and Video Processing, 2018, 12 : 1107 - 1114
  • [23] Automatic benthic imagery recognition using a hierarchical two-stage approach
    Rimavicius, Tadas
    Gelzinis, Adas
    Verikas, Antanas
    Vaiciukynas, Evaldas
    Bacauskiene, Marija
    Saskov, Aleksej
    SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) : 1107 - 1114
  • [24] A two-stage algorithm for identification of nonlinear dynamic systems
    Li, Kang
    Peng, Jian-Xun
    Bai, Er-Wei
    AUTOMATICA, 2006, 42 (07) : 1189 - 1197
  • [25] Automatic speech recognition systems
    Catariov, A
    Information Technologies 2004, 2004, 5822 : 83 - 93
  • [26] A Two-Stage Strategy to Introduce Spectral Matching into Recognition of Occluded Objects
    Wu, Jia Yun
    Chen, Xiao
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2012), 2012,
  • [27] Two-Stage Phone Recognition System using Articulatory and Spectral Features
    Manjunath, K. E.
    Rao, K. Sreenivasa
    Reddy, Gurunath M.
    2015 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION ENGINEERING SYSTEMS (SPACES), 2015, : 107 - 111
  • [28] A two-stage algorithm for one-microphone reverberant speech enhancement
    Wu, MY
    Wang, DL
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 774 - 784
  • [29] Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm
    Sun, Linhui
    Huang, Yiqing
    Li, Qiu
    Li, Pingan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (05) : 1253 - 1261
  • [30] Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm
    Linhui Sun
    Yiqing Huang
    Qiu Li
    Pingan Li
    Signal, Image and Video Processing, 2022, 16 : 1253 - 1261