A hierarchical language identification system for Indian languages

被引:41
|
作者
Jothilakshmi, S. [1 ]
Ramalingam, V. [1 ]
Palanivel, S. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Annamalainagar 608002, Tamil Nadu, India
关键词
Language identification; Mel frequency cepstral coefficients; Shifted delta cepstral coefficients; Hidden Markov model; Gaussian mixture model; Neural networks; Indian languages; SPOKEN; RECOGNITION; SPEECH;
D O I
10.1016/j.dsp.2011.11.008
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC). MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56% accuracy. The performance of GMM based LID system with SDC is also considerable. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:544 / 553
页数:10
相关论文
共 50 条
  • [1] A GMM-BASED HIERARCHICAL AUTOMATIC LANGUAGE IDENTIFICATION SYSTEM FOR INDIAN LANGUAGES
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    APPLIED ARTIFICIAL INTELLIGENCE, 2012, 26 (06) : 554 - 570
  • [2] Towards Improving the Performance of Language Identification System for Indian Languages
    Anto, Abitha
    Sreekumar, K. T.
    Kumar, Santhosh C.
    Raj, Reghu P. C.
    2014 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC), 2014, : 42 - 46
  • [3] LANGUAGE IDENTIFICATION AND CORRECTION IN CORRUPTED TEXTS OF REGIONAL INDIAN LANGUAGES
    Yadav, Pooja
    Kaur, Sarvjeet
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [4] Effect of Language Independent Transcribers on Spoken Language Identification for Different Indian Languages
    Saikia, Rajlakshmi
    Singh, Sanasam Ranbir
    Sarmah, Priyankoo
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 214 - 217
  • [5] Language identification system for South African languages
    Mashao, DJ
    PROCEEDINGS OF THE 1998 SOUTH AFRICAN SYMPOSIUM ON COMMUNICATIONS AND SIGNAL PROCESSING: COMSIG '98, 1998, : 193 - 196
  • [6] AUTOMATIC LANGUAGE IDENTIFICATION OF THREE INDIAN LANGUAGES USING VECTOR QUANTIZATION
    Roy, Pinki
    Das, Pradip K.
    FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING (ICCEE 2011), 2011, : 293 - +
  • [7] Investigating Scalability in Hierarchical Language Identification System
    Irtza, Saad
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Li, Haizhou
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2581 - 2585
  • [8] Spoken language identification for Indian languages using split and merge EM algorithm
    Manwani, Naresh
    Mitra, Suman K.
    Joshi, M. V.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 463 - 468
  • [9] Sparse Representation based Language Identification using Prosodic Features for Indian Languages
    Singh, Om Prakash
    Haris, B. C.
    Sinha, Rohit
    Chettri, Bhusan
    Pradhan, Abhishek
    2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,
  • [10] Spoken Language Identification of Indian Languages in Adversarial Synthetic and Noisy Attacking Environments
    Ambili, A R
    Roy, Rajesh Cherian
    Proceedings of International Conference on Computing, Communication, Security and Intelligent Systems, IC3SIS 2022, 2022,