A large vocabulary continuous speech recognition system for Persian language

被引:12
|
作者
Sameti, Hossein [1 ]
Veisi, Hadi [1 ]
Bahrani, Mohammad [1 ]
Babaali, Bagher [1 ]
Hosseinzadeh, Khosro [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
关键词
ADAPTATION; COMPONENT;
D O I
10.1186/1687-4722-2011-426795
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The first large vocabulary speech recognition system for the Persian language is introduced in this paper. This continuous speech recognition system uses most standard and state-of-the-art speech and language modeling techniques. The development of the system, called Nevisa, has been started in 2003 with a dominant academic theme. This engine incorporates customized established components of traditional continuous speech recognizers and its parameters have been optimized for real applications of the Persian language. For this purpose, we had to identify the computational challenges of the Persian language, especially for text processing and extract statistical and grammatical language models for the Persian language. To achieve this, we had to either generate the necessary speech and text corpora or modify the available primitive corpora available for the Persian language. In the proposed system, acoustic modeling is based on hidden Markov models, and optimized decoding, pruning and language modeling techniques were used in the system. Both statistical and grammatical language models were incorporated in the system. MFCC representation with some modifications was used as the speech signal feature. In addition, a VAD was designed and implemented based on signal energy and zero-crossing rate. Nevisa is equipped with out-of-vocabulary capability for applications with medium or small vocabulary sizes. Powerful robustness techniques were also utilized in the system. Model-based approaches like PMC, MLLR and MAP, along with feature robustness methods such as CMS, PCA, RCC and VTLN, and speech enhancement methods like spectral subtraction and Wiener filtering, along with their modified versions, were diligently implemented and evaluated in the system. A new robustness method called PC-PMC was also proposed and incorporated in the system. To evaluate the performance and optimize the parameters of the system in noisy-environment tasks, four real noisy speech data sets were generated. The final performance of Nevisa in noisy environments is similar to the clean conditions, thanks to the various robustness methods implemented in the system. Overall recognition performance of the system in clean and noisy conditions assures us that the system is a real-world product as well as a competitive ASR engine.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] A review of large-vocabulary continuous-speech recognition
    Young, S
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 45 - 57
  • [42] Feature selection in mandarin large vocabulary continuous speech recognition
    Zhu, X
    Chen, YN
    Liu, J
    Liu, RS
    [J]. 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 508 - 511
  • [43] A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition
    Zweig, Geoffrey
    Nguyen, Patrick
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 152 - 157
  • [44] Language identification through large vocabulary continous speech recognition
    Lim, BP
    Li, HZ
    Chen, Y
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 49 - 52
  • [45] Large vocabulary speech recognition with multispan statistical language models
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
  • [46] DISTRIBUTED SUBMODULAR MAXIMIZATION FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Qi, Jun
    Liu, Xu
    Kamijo, Shunshuke
    Tejedor, Javier
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2501 - 2505
  • [47] A word graph algorithm for large vocabulary continuous speech recognition
    Ortmanns, S
    Ney, H
    Aubert, X
    [J]. COMPUTER SPEECH AND LANGUAGE, 1997, 11 (01): : 43 - 72
  • [48] Using a transcription graph for large vocabulary continuous speech recognition
    Li, Z
    OShaughnessy, D
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 121 - 124
  • [49] A multispan language modeling framework for large vocabulary speech recognition
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 456 - 467
  • [50] DEEP-FSMN FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Zhang, Shiliang
    Lei, Ming
    Yan, Zhijie
    Dai, Lirong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5869 - 5873