ENHANCED POWER-NORMALIZED FEATURES FOR MANDARIN ROBUST SPEECH RECOGNITION BASED ON A VOICED-UNVOICED-SILENCE DECISION

被引:0
|
作者
Tan, Ying-Wei [1 ]
Liu, Wen-Ju [1 ]
Yang, Zhen-Lei [1 ]
Chen, Ming-Ming [1 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China
关键词
Mandarin robust speech recognition; a voiced-unvoiced-silence decision; enhanced power-normalized features; a weighted harmonic-noise-model; GAMMATONE FILTERBANK;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced powernormalized features are obtained and used to process noisecorrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).
引用
收藏
页码:222 / 226
页数:5
相关论文
共 12 条
  • [1] Voiced-Unvoiced-Silence Classifications of Speech Using Hybrid Features and a Network Classifier
    Qi, Yingyong
    Hunt, Bobby R.
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 250 - 255
  • [2] Voiced-unvoiced-silence speech sound classification based on unsupervised learning
    Deng, Huiqun
    O'Shaughnessy, Douglas
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 176 - 179
  • [3] Clustering based Voiced-Unvoiced-Silence Detection in Speech using Temporal and Spectral Parameters
    Mondal, Sujoy
    Das Barman, Abhirup
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2015, : 390 - 394
  • [4] New method of voiced-unvoiced-silence classifications based on the graph templet matching of speech waveform
    [J]. 2000, Journal of Nanjing Institute of Posts and Telecommunications, China (20):
  • [5] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
    Kim, Chanwoo
    Stern, Richard M.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
  • [6] POWER-NORMALIZED CEPSTRAL COEFFICIENTS (PNCC) FOR ROBUST SPEECH RECOGNITION
    Kim, Chanwoo
    Stern, Richard M.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4101 - 4104
  • [7] POWER-NORMALIZED PLP (PNPLP) FEATURE FOR ROBUST SPEECH RECOGNITION
    Fan, Lichun
    Ke, Dengfeng
    Fu, Xiaoyin
    Lu, Shixiang
    Xu, Bo
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 224 - 228
  • [8] Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients
    Tamazin, Mohamed
    Gouda, Ahmed
    Khedr, Mohamed
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (10):
  • [9] Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision
    O'Shaughnessy, D
    Tolba, H
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 413 - 416
  • [10] ROBUST AUDIO-VISUAL MANDARIN SPEECH RECOGNITION BASED ON ADAPTIVE DECISION FUSION AND TONE FEATURES
    Liu, Hong
    Chen, Zhengyan
    Shi, Wei
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1381 - 1385