Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引:17
|
作者
Shahnawazuddin, Syed [1 ]
Sinha, Rohit [2 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India
[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;
D O I
10.1109/LSP.2017.2705085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.
引用
收藏
页码:1128 / 1132
页数:5
相关论文
共 50 条
  • [41] Model compensation using robust features for robust speech recognition
    Zhang, Jun
    Wei, Gang
    Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, 2003, 18 (03):
  • [42] An Investigation into the Effect of Pitch Transformation on Children Speech Recognition
    Ghai, Shweta
    Sinha, Rohit
    2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 1731 - 1736
  • [43] Significance of Group Delay based Acoustic Features in the Linguistic Search Space for Robust Speech Recognition
    Ramya, R.
    Hegde, Rajesh M.
    Murthy, Hema A.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1537 - +
  • [44] POWER-NORMALIZED CEPSTRAL COEFFICIENTS (PNCC) FOR ROBUST SPEECH RECOGNITION
    Kim, Chanwoo
    Stern, Richard M.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4101 - 4104
  • [45] Speaker normalized spectral subband parameters for noise robust speech recognition
    Tsuge, Satoru
    Fukada, Toshiaki
    Singer, Harald
    Paliwal, Kuldip K.
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (06): : 425 - 431
  • [46] Speaker normalized spectral subband parameters for noise robust speech recognition
    Tsuge, S
    Fukada, T
    Singer, H
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 285 - 288
  • [47] Feature enhancement by speaker-normalized splice for robust speech recognition
    Shinohara, Yusuke
    Masuko, Takashi
    Akamine, Masami
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4881 - 4884
  • [48] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
    Kim, Chanwoo
    Stern, Richard M.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
  • [49] POWER-NORMALIZED PLP (PNPLP) FEATURE FOR ROBUST SPEECH RECOGNITION
    Fan, Lichun
    Ke, Dengfeng
    Fu, Xiaoyin
    Lu, Shixiang
    Xu, Bo
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 224 - 228
  • [50] Robust speech recognition by using compensated acoustic scores
    Sato, S
    Onoe, K
    Kobayashi, A
    Imai, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 915 - 921