A hierarchical language identification system for Indian languages

被引:41
|
作者
Jothilakshmi, S. [1 ]
Ramalingam, V. [1 ]
Palanivel, S. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Annamalainagar 608002, Tamil Nadu, India
关键词
Language identification; Mel frequency cepstral coefficients; Shifted delta cepstral coefficients; Hidden Markov model; Gaussian mixture model; Neural networks; Indian languages; SPOKEN; RECOGNITION; SPEECH;
D O I
10.1016/j.dsp.2011.11.008
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC). MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56% accuracy. The performance of GMM based LID system with SDC is also considerable. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:544 / 553
页数:10
相关论文
共 50 条
  • [31] Out of Set Language Modelling in Hierarchical Language Identification
    Irtza, Saad
    Sethu, Vidhyasaharan
    Fernando, Sarith
    Ambikairajah, Eliathamby
    Li, Haizhou
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3270 - 3274
  • [32] Hierarchical Language Identification based on Automatic Language Clustering
    Yin, Bo
    Ambikairajah, Eliathamby
    Chen, Fang
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1217 - 1220
  • [33] Cognate Identification to improve Phylogenetic trees for Indian Languages
    Kanojia, Diptesh
    Kulkarni, Malhar
    Bhattacharyya, Pushpak
    Haffari, Gholemreza
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 297 - 300
  • [34] A HIERARCHICAL MULTIMICROPROCESSOR SYSTEM FOR OBJECT ORIENTED LANGUAGES
    PAPAZOGLOU, M
    PINTELAS, P
    MICROPROCESSING AND MICROPROGRAMMING, 1987, 19 (02): : 129 - 141
  • [35] Word Level Language Identification in Code-Mixed Data using Word Embedding Methods for Indian Languages
    Chaitanya, Inumella
    Madapakula, Indeevar
    Gupta, Subham Kumar
    Thara, S.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1137 - 1141
  • [36] Statistical Machine Translation System for Indian Languages
    Raju, B. N. V. Narasimha
    Raju, M. S. V. S. Bhadri
    2016 IEEE 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC), 2016, : 174 - 177
  • [37] Improvements on Hierarchical Language Identification based on automatic language clustering
    Yin, Bo
    Ambikairajah, Eliathamby
    Chen, Fang
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4241 - 4244
  • [38] Automatic language identification: a case study of Pahari languages
    Gusain, Rachana
    Dash, Satya Ranjan
    Parida, Shantipriya
    Jha, Girish Nath
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (03) : 1361 - 1387
  • [39] GlotLID: Language Identification for Low-Resource Languages
    Kargaran, Amir Hossein
    Imani, Ayyoob
    Yvon, Francois
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6155 - 6218
  • [40] Automatic language identification: a case study of Pahari languages
    Rachana Gusain
    Satya Ranjan Dash
    Shantipriya Parida
    Girish Nath Jha
    Language Resources and Evaluation, 2023, 57 : 1361 - 1387