Confusion analysis in phoneme based speech recognition in Hindi

被引:0
|
作者
Shobha Bhatt
Amita Dev
Anurag Jain
机构
[1] GGSIP University,University School of Information and Communication Technology
[2] Indira Gandhi Delhi Technical University for Women,undefined
关键词
Phoneme; Speech recognition; Hidden Markov Model; PLP;
D O I
暂无
中图分类号
学科分类号
摘要
Phoneme recognition is an essential step in the development of a speech recognition system (SRS), as phonemes are fundamental building blocks in a spoken language. This research work aimed to present phoneme recognition with systematic confusion analysis for the Hindi language. The accuracy of phoneme recognition is the foundation for developing an efficient SRS. Therefore, the systematic confusion analysis for phoneme recognition is essential to improve speech recognition performance. Experiments conducted on Continuous Hindi speech corpus for phoneme recognition with speaker-dependent mode using Hidden Markov Model (HMM) based tool kit HTK. Feature extraction technique Perceptual Linear Predictive Coefficient (PLP) was used with five states Monophones HMM model. Tests were performed for exploring the recognition of Hindi vowels and consonants. Confusion matrices were presented for both vowels and consonants with analysis and possible solutions. During systematic analysis, the vowels were divided into front, middle, and back vowels while consonants were categorized based on place of articulation and manner of articulation. Research findings show that some Hindi phonemes have significant effects on speech recognition. The investigations also reveal that some Hindi phonemes are mostly confused, and some phonemes have more deletions and insertions. The research further demonstrates that the words made of less number of phonemes show more insertion errors. It was also found that most of the Hindi sentences end with some specific words. These particular words can be used to reduce the search place in language modeling for improving speech recognition. The research findings can be utilized to enhance the performance of the speech recognition system by selecting suitable feature extraction techniques and classification techniques for phonemes. The outcome of the research can also be used to develop improved pronunciation dictionaries and designing the text for developing phonetically balanced speech corpus for improvement in speech recognition. Experimental results show an average corrected recognition score of 70% for vowel class and consonant categories, the maximum average corrected recognition score of 94% was obtained with palatal sounds, and the lowest average corrected recognition score of 54% was achieved with liquid sounds. The comparative analysis of the presented work was made to similar existing works.
引用
收藏
页码:4213 / 4238
页数:25
相关论文
共 50 条
  • [21] WAVELET SUB-BAND BASED TEMPORAL FEATURES FOR ROBUST HINDI PHONEME RECOGNITION
    Farooq, O.
    Datta, S.
    Shrotriya, M. C.
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2010, 8 (06) : 847 - 859
  • [22] Speaker independent speech recognition system based on phoneme identification
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    [J]. ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
  • [23] A feature-based hierarchical speech recognition system for Hindi
    Samudravijaya, K
    Ahuja, R
    Bondale, N
    Jose, T
    Krishnan, S
    Poddar, P
    Rao, PVS
    Raveendran, R
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1998, 23 (4): : 313 - 340
  • [24] Emotion Recognition and Conversion Based on Segmentation of Speech in Hindi Language
    Agarwal, Archana
    Dev, Amita
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1843 - 1847
  • [25] A feature-based hierarchical speech recognition system for Hindi
    K Samudravijaya
    R Ahuja
    N Bondale
    T Jose
    S Krishnan
    P Poddar
    xxPVS Rao
    R Raveendran
    [J]. Sadhana, 1998, 23 : 313 - 340
  • [26] A Discrete Wavelet Transform Based Approach to Hindi Speech Recognition
    Ranjan, Shivesh
    [J]. 2010 INTERNATIONAL CONFERENCE ON SIGNAL ACQUISITION AND PROCESSING: ICSAP 2010, PROCEEDINGS, 2010, : 345 - 348
  • [27] PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION
    Zhou, Wei
    Berger, Simon
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5644 - 5648
  • [28] Myoclectric signal classification for phoneme-based speech recognition
    Scheme, Erik J.
    Hudgins, Bernard
    Parker, Phillip A.
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
  • [29] PHONEME BASED RESPIRATORY ANALYSIS OF READ SPEECH
    Nallanthighal, Venkata Srikanth
    Harma, Aki
    Strik, Helmer
    Doss, Mathew Magimai
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 191 - 195
  • [30] The Gamma MLP for speech phoneme recognition
    Lawrence, S
    Tsoi, AC
    Back, AD
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 785 - 791