A hierarchical language identification system for Indian languages

被引:41
|
作者
Jothilakshmi, S. [1 ]
Ramalingam, V. [1 ]
Palanivel, S. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Annamalainagar 608002, Tamil Nadu, India
关键词
Language identification; Mel frequency cepstral coefficients; Shifted delta cepstral coefficients; Hidden Markov model; Gaussian mixture model; Neural networks; Indian languages; SPOKEN; RECOGNITION; SPEECH;
D O I
10.1016/j.dsp.2011.11.008
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC). MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56% accuracy. The performance of GMM based LID system with SDC is also considerable. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:544 / 553
页数:10
相关论文
共 50 条
  • [21] Automatic Identification of Discourse Relations in Indian Languages
    Devi, Sobha Lalitha
    Gopalan, Sindhuja
    Lakshmi, S.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [22] Moment Based Sign Language Recognition For Indian Languages
    Patel, Umang
    Ambekar, Aarti G.
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [23] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Joyanta Basu
    Soma Khan
    Rajib Roy
    Tapan Kumar Basu
    Swanirbhar Majumder
    Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
  • [24] A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features
    Bhanja, Chuya China
    Laskar, Mohammad Azharuddin
    Laskar, Rabul Hussain
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (05) : 2266 - 2296
  • [25] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Basu, Joyanta
    Khan, Soma
    Roy, Rajib
    Basu, Tapan Kumar
    Majumder, Swanirbhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013
  • [26] A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features
    Chuya China Bhanja
    Mohammad Azharuddin Laskar
    Rabul Hussain Laskar
    Circuits, Systems, and Signal Processing, 2019, 38 : 2266 - 2296
  • [27] Anaphora Resolution System for Indian Languages
    Devi, Sobha Lalitha
    Ram, Vijay Sundar R.
    Rao, Pattabhi R. K.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [28] Comparative Analysis of Translation Systems from Indian Languages to Indian Sign Language
    Singh G.
    Goyal V.
    Goyal L.
    SN Computer Science, 3 (4)
  • [29] Using language cluster models in hierarchical language identification
    Irtza, Saad
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Li, Haizhou
    SPEECH COMMUNICATION, 2018, 100 : 30 - 40
  • [30] Language identification: How to distinguish similar languages?
    Ljubesic, Nikola
    Mikelic, Nives
    Boras, Damir
    PROCEEDINGS OF THE ITI 2007 29TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2007, : 541 - +