Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition

被引:8
|
作者
Kim, Hwamin [1 ]
Park, Jeong-Sik [2 ]
机构
[1] Hankuk Univ Foreign Studies, Dept English Linguist, Seoul 02450, South Korea
[2] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 07期
基金
新加坡国家研究基金会;
关键词
language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network; SPEAKER;
D O I
10.3390/app10072225
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application This research can be applied for a multi-lingual automatic speech recognition system that handles input speech with two or more languages. Such a system requires the rapid identification of a language from input speech to transmit the speech to a recognition server targeting the language. Abstract The conventional speech recognition systems can handle the input speech of a specific single language. To realize multi-lingual speech recognition, a language should be firstly identified from input speech. This study proposes an efficient Language IDentification (LID) approach for the multi-lingual system. The standard LID tasks depend on common acoustic features used in speech recognition. However, the features may convey insufficient language-specific information, as they aim to discriminate the general tendency of phonemic information. This study investigates another type of feature characterizing language-specific properties, considering computation complexity. We focus on speech rhythm features providing the prosodic characteristics of speech signals. The rhythm features represent the tendency of consonants and vowels of languages, and therefore, classifying them from speech signals is necessary. For the rapid classification, we employ Gaussian Mixture Model (GMM)-based learning in which two GMMs corresponding to consonants and vowels are firstly trained and used for classifying them. By using the classification results, we estimate the tendency of two phonemic groups such as the duration of consonantal and vocalic intervals and calculate rhythm metrics called R-vector. In experiments on several speech corpora, the automatically extracted R-vector provided similar language tendencies to the conventional studies on linguistics. In addition, the proposed R-vector-based LID approach demonstrated superior or comparable LID performance to the conventional approaches in spite of low computation complexity.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media
    Mishra S.
    Prasad S.
    Mishra S.
    SN Computer Science, 2021, 2 (2)
  • [42] Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features
    Kalinli, Ozlem
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3613 - 3617
  • [43] MULTI-LINGUAL SPEECH RECOGNITION WITH LOW-RANK MULTI-TASK DEEP NEURAL NETWORKS
    Mohan, Aanchan
    Rose, Richard
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4994 - 4998
  • [44] MULTI-LINGUAL DEEP NEURAL NETWORKS FOR LANGUAGE RECOGNITION
    Marcos, Luis Murphy
    Richardson, Frederick
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 330 - 334
  • [45] A comprehensive review on detection of hate speech for multi-lingual data
    Narula, Rachna
    Chaudhary, Poonam
    SOCIAL NETWORK ANALYSIS AND MINING, 2025, 14 (01)
  • [46] Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
    Demiroglu, Cenk
    Besirli, Asli
    Ozkanca, Yasin
    Celik, Selime
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
  • [47] Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
    Cenk Demiroglu
    Aslı Beşirli
    Yasin Ozkanca
    Selime Çelik
    EURASIP Journal on Audio, Speech, and Music Processing, 2020
  • [48] Topological invariants as speech features for automatic speech recognition
    Kacur, Juraj
    Chudy, Vladimir
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2014, 7 (04) : 235 - 244
  • [49] Language identification in multi-lingual web-documents
    Mandl, Thomas
    Shramko, Margaryta
    Tartakovski, Olga
    Womser-Hacker, Christa
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2006, 3999 : 153 - 163
  • [50] Multi-Task Based Mispronunciation Detection of Children Speech Using Multi-Lingual Information
    Wei, Linxuan
    Dong, Wenwei
    Lin, Binghuai
    Zhang, Jinsong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1791 - 1794