Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition

被引:8
|
作者
Kim, Hwamin [1 ]
Park, Jeong-Sik [2 ]
机构
[1] Hankuk Univ Foreign Studies, Dept English Linguist, Seoul 02450, South Korea
[2] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 07期
基金
新加坡国家研究基金会;
关键词
language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network; SPEAKER;
D O I
10.3390/app10072225
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application This research can be applied for a multi-lingual automatic speech recognition system that handles input speech with two or more languages. Such a system requires the rapid identification of a language from input speech to transmit the speech to a recognition server targeting the language. Abstract The conventional speech recognition systems can handle the input speech of a specific single language. To realize multi-lingual speech recognition, a language should be firstly identified from input speech. This study proposes an efficient Language IDentification (LID) approach for the multi-lingual system. The standard LID tasks depend on common acoustic features used in speech recognition. However, the features may convey insufficient language-specific information, as they aim to discriminate the general tendency of phonemic information. This study investigates another type of feature characterizing language-specific properties, considering computation complexity. We focus on speech rhythm features providing the prosodic characteristics of speech signals. The rhythm features represent the tendency of consonants and vowels of languages, and therefore, classifying them from speech signals is necessary. For the rapid classification, we employ Gaussian Mixture Model (GMM)-based learning in which two GMMs corresponding to consonants and vowels are firstly trained and used for classifying them. By using the classification results, we estimate the tendency of two phonemic groups such as the duration of consonantal and vocalic intervals and calculate rhythm metrics called R-vector. In experiments on several speech corpora, the automatically extracted R-vector provided similar language tendencies to the conventional studies on linguistics. In addition, the proposed R-vector-based LID approach demonstrated superior or comparable LID performance to the conventional approaches in spite of low computation complexity.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
    张劲松
    Takatoshi Jitsuhiro
    Hirofumi Yamamoto
    胡新辉
    Satoshi Nakamura
    TsinghuaScienceandTechnology, 2008, (04) : 545 - 552
  • [32] MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
    Sharma, Mayank
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6907 - 6911
  • [33] JS']JSPEECH: A MULTI-LINGUAL CONVERSATIONAL SPEECH CORPUS
    Choobbasti, Ali Janalizadeh
    Gholamian, Mohammad Erfan
    Vaheb, Amir
    Safavi, Saeid
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 927 - 933
  • [34] Development of the "VoiceTra" Multi-Lingual Speech Translation System
    Matsuda, Shigeki
    Hayashi, Teruaki
    Ashikari, Yutaka
    Shiga, Yoshinori
    Kashioka, Hidenori
    Yasuda, Keiji
    Okuma, Hideo
    Uchiyama, Masao
    Sumita, Eiichiro
    Kawai, Hisashi
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (04): : 621 - 632
  • [35] Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features
    Zhan, Qingran
    Motlicek, Petr
    Du, Shixuan
    Shan, Yahui
    Ma, Sifan
    Xie, Xiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1912 - 1916
  • [36] Multi-Lingual Depression-Level Assessment from Conversational Speech Using Acoustic and Text Features
    Ozkanca, Yasin
    Demiroglu, Cenk
    Besirli, Ash
    Celik, Selime
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3398 - 3402
  • [37] Automatic language identification using large vocabulary continuous speech recognition
    Mendoza, S
    Gillick, L
    Ito, Y
    Lowe, S
    Newmann, M
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 785 - 788
  • [38] Multi-Lingual Speech Emotion Recognition: Investigating Similarities between English and German Languages
    Devi, Ghaayathri K.
    Likhitha, Kolluru
    Akshaya, J.
    Rfj, Gokul
    Lal, Jyothish G.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [39] A Multi-Lingual Speech Recognition-Based Framework to Human-Drone Interaction
    Choutri, Kheireddine
    Lagha, Mohand
    Meshoul, Souham
    Batouche, Mohamed
    Kacel, Yasmine
    Mebarkia, Nihad
    ELECTRONICS, 2022, 11 (12)
  • [40] AUTOMATIC LOCALIZATION OF A LANGUAGE-INDEPENDENT SUB-NETWORK ON DEEP NEURAL NETWORKS TRAINED BY MULTI-LINGUAL SPEECH
    Matsuda, Shigeki
    Lu, Xugang
    Kashioka, Hideki
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7359 - 7362