Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引：17

作者：

Shahnawazuddin, Syed ^{[1
]}

Sinha, Rohit ^{[2
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India

[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India

来源：

IEEE SIGNAL PROCESSING LETTERS | 2017年 / 24卷 / 08期

关键词：

Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;

D O I：

10.1109/LSP.2017.2705085

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.

引用

页码：1128 / 1132

页数：5

共 50 条

[41] Model compensation using robust features for robust speech recognition
Zhang, Jun
Wei, Gang
Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, 2003, 18 (03):
[42] An Investigation into the Effect of Pitch Transformation on Children Speech Recognition
Ghai, Shweta
Sinha, Rohit
2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 1731 - 1736
[43] Significance of Group Delay based Acoustic Features in the Linguistic Search Space for Robust Speech Recognition
Ramya, R.
Hegde, Rajesh M.
Murthy, Hema A.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1537 - +
[44] POWER-NORMALIZED CEPSTRAL COEFFICIENTS (PNCC) FOR ROBUST SPEECH RECOGNITION
Kim, Chanwoo
Stern, Richard M.
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4101 - 4104
[45] Speaker normalized spectral subband parameters for noise robust speech recognition
Tsuge, Satoru
Fukada, Toshiaki
Singer, Harald
Paliwal, Kuldip K.
Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (06): : 425 - 431
[46] Speaker normalized spectral subband parameters for noise robust speech recognition
Tsuge, S
Fukada, T
Singer, H
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 285 - 288
[47] Feature enhancement by speaker-normalized splice for robust speech recognition
Shinohara, Yusuke
Masuko, Takashi
Akamine, Masami
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4881 - 4884
[48] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
Kim, Chanwoo
Stern, Richard M.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
[49] POWER-NORMALIZED PLP (PNPLP) FEATURE FOR ROBUST SPEECH RECOGNITION
Fan, Lichun
Ke, Dengfeng
Fu, Xiaoyin
Lu, Shixiang
Xu, Bo
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 224 - 228
[50] Robust speech recognition by using compensated acoustic scores
Sato, S
Onoe, K
Kobayashi, A
Imai, T
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 915 - 921

← 1 2 3 4 5 →