Multiple Feature Extraction for RNN-based Assamese Speech Recognition for Speech to Text Conversion Application

被引:0
|
作者
Dutta, Krishna [1 ]
Sarma, Kandarpa Kumar [1 ]
机构
[1] Gauhati Univ, Dept ECE, Gauhati 781014, Assam, India
来源
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, DEVICES AND INTELLIGENT SYSTEMS (CODLS) | 2012年
关键词
Moving Average Filter; LPC; MFCC; RNN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current work proposes a prototype model for speech recognition in Assamese language using Linear Predictive Coding (LPC) and Mel frequency cepstral coefficient (MFCC). The speech recognition is a part of a speech to text conversion system. The LPC and MFCC features are extracted by two different Recurrent Neural Networks (RNN), which are used to recognize the vocal extract of Assamese language- a major language in the North Eastern part of India. In this work, decision block is designed by a combined framework of RNN block to extract the features. Using this combined architecture our system is able to generate 10% gain in the recognition rate than the case when individual architectures are used.
引用
收藏
页码:600 / 603
页数:4
相关论文
共 50 条
  • [1] RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
    Wang, WJ
    Liao, YF
    Chen, SH
    SPEECH COMMUNICATION, 2002, 36 (3-4) : 247 - 265
  • [2] An overview of RNN-based Mandarin speech recognition approaches
    Liao, YF
    Hong, WT
    Wang, WJ
    Wang, YR
    Chen, SH
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 1999, 22 (05) : 535 - 547
  • [3] A modular RNN-based method for continuous Mandarin speech recognition
    Liao, YF
    Chen, SH
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 252 - 263
  • [4] An RNN-based prosodic information synthesizer for Mandarin text-to-speech
    Chen, SH
    Hwang, SH
    Wang, YR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03): : 226 - 239
  • [5] AN RNN-BASED SPECTRAL INFORMATIONG ENERATION FOR MANDARIN TEXT-TO-SPEECH
    EOOO/CCL, Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan
    不详
    Eur. Conf. Speech Commun. Technol., EUROSPEECH, 1600, (549-552):
  • [6] An RNN-based noise estimation and likelihood compensation for noisy speech recognition
    Hong, WT
    Chen, SH
    NEURAL NETWORKS FOR SIGNAL PROCESSING VI, 1996, : 293 - 301
  • [7] An RNN-based preclassification method for fast continuous Mandarin speech recognition
    Chen, SH
    Liao, YF
    Chiang, SM
    Chang, SG
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 86 - 90
  • [8] Optimization of RNN-Based Speech Activity Detection
    Gelly, Gregory
    Gauvain, Jean-Luc
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 646 - 656
  • [9] On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model
    Soutner, Daniel
    Zelinka, Jan
    Mueller, Ludek
    SPEECH AND COMPUTER, 2014, 8773 : 315 - 321
  • [10] The application of optimization in feature extraction of speech recognition
    Gu, L
    Liu, RS
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 745 - 748