Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引:17
|
作者
Shahnawazuddin, Syed [1 ]
Sinha, Rohit [2 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India
[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;
D O I
10.1109/LSP.2017.2705085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.
引用
收藏
页码:1128 / 1132
页数:5
相关论文
共 50 条
  • [1] Analyzing Pitch Robustness of PMVDR and MFCC Features for Children's Speech Recognition
    Ghai, Shweta
    Sinha, Rohit
    2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [2] Pitch and noise normalized acoustic feature for children's ASR
    Yadav, Ishwar Chandra
    Pradhan, Gayadhar
    DIGITAL SIGNAL PROCESSING, 2021, 109
  • [3] Pitch and noise normalized acoustic feature for children's ASR
    Chandra Yadav, Ishwar
    Pradhan, Gayadhar
    Digital Signal Processing: A Review Journal, 2021, 109
  • [4] Robust recognition of children's speech
    Potamianos, A
    Narayan, S
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (06): : 603 - 616
  • [5] Gammatone-Filterbank Based Pitch-Normalized Cepstral Coefficients for Zero-Resource Children's ASR
    Shahnawazuddin, Syed
    Ankita
    Kumar, Avinash
    Kathania, Hemant Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 494 - 505
  • [6] Bi-spectral acoustic features for robust speech recognition
    Onoe, Kazuo
    Sato, Shoei
    Homma, Shinichi
    Kobayashi, Akio
    Imai, Torn
    Takagi, Tohru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 631 - 634
  • [7] Pitch restoration for robust speech recognition
    Lima, C
    Tavares, A
    Silva, C
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 18 - 22
  • [8] NORMALIZED AMPLITUDE MODULATION FEATURES FOR LARGE VOCABULARY NOISE-ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    Graciarena, Martin
    Mandal, Arindam
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4117 - 4120
  • [9] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
    Bhardwaj, Vivek
    Kukreja, Vinay
    Applied Acoustics, 2021, 177
  • [10] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
    Bhardwaj, Vivek
    Kukreja, Vinay
    APPLIED ACOUSTICS, 2021, 177