Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition

被引:8
|
作者
Kim, Hwamin [1 ]
Park, Jeong-Sik [2 ]
机构
[1] Hankuk Univ Foreign Studies, Dept English Linguist, Seoul 02450, South Korea
[2] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 07期
基金
新加坡国家研究基金会;
关键词
language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network; SPEAKER;
D O I
10.3390/app10072225
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application This research can be applied for a multi-lingual automatic speech recognition system that handles input speech with two or more languages. Such a system requires the rapid identification of a language from input speech to transmit the speech to a recognition server targeting the language. Abstract The conventional speech recognition systems can handle the input speech of a specific single language. To realize multi-lingual speech recognition, a language should be firstly identified from input speech. This study proposes an efficient Language IDentification (LID) approach for the multi-lingual system. The standard LID tasks depend on common acoustic features used in speech recognition. However, the features may convey insufficient language-specific information, as they aim to discriminate the general tendency of phonemic information. This study investigates another type of feature characterizing language-specific properties, considering computation complexity. We focus on speech rhythm features providing the prosodic characteristics of speech signals. The rhythm features represent the tendency of consonants and vowels of languages, and therefore, classifying them from speech signals is necessary. For the rapid classification, we employ Gaussian Mixture Model (GMM)-based learning in which two GMMs corresponding to consonants and vowels are firstly trained and used for classifying them. By using the classification results, we estimate the tendency of two phonemic groups such as the duration of consonantal and vocalic intervals and calculate rhythm metrics called R-vector. In experiments on several speech corpora, the automatically extracted R-vector provided similar language tendencies to the conventional studies on linguistics. In addition, the proposed R-vector-based LID approach demonstrated superior or comparable LID performance to the conventional approaches in spite of low computation complexity.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [22] Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition
    Vanderreydt, Geoffroy
    Remy, Francois
    Demuynck, Kris
    INTERSPEECH 2022, 2022, : 3053 - 3057
  • [23] PRELIMINARIES TO AUTOMATIC RECOGNITION OF SPEECH - LANGUAGE IDENTIFICATION
    HOUSE, AS
    NEUBERG, EP
    WOHLFORD, RE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 : S34 - S34
  • [24] CROSS-LINGUAL CONTEXT SHARING AND PARAMETER-TYING FOR MULTI-LINGUAL SPEECH RECOGNITION
    Mohan, Aanchan
    Rose, Richard
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 126 - 131
  • [25] An optimized machine translation technique for multi-lingual speech to sign language notation
    Dhanjal, Amandeep Singh
    Singh, Williamjeet
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24099 - 24117
  • [26] An optimized machine translation technique for multi-lingual speech to sign language notation
    Amandeep Singh Dhanjal
    Williamjeet Singh
    Multimedia Tools and Applications, 2022, 81 : 24099 - 24117
  • [27] Automatic Multi-lingual Script Recognition Application
    Abu-Ain, Waleed Abdel Karim
    Abdullah, Siti Norul Huda Sheikh
    Omar, Khairuddin
    Abd Rahman, Siti Zaharah
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2018, 18 (03): : 203 - 221
  • [28] Multi-lingual and multi-modal speech processing and applications
    Ivanecky, J
    Fischer, J
    Mast, M
    Kunzmann, S
    Ross, T
    Fischer, V
    PATTERN RECOGNITION, PROCEEDINGS, 2005, 3663 : 149 - 159
  • [29] Using Speech Rhythm for Acoustic Language Identification
    Timoshenko, Ekaterina
    Hoege, Harald
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1473 - 1476
  • [30] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
    Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 2-2-2 Keihanna Science City, Kyoto, 619-0288, Japan
    不详
    不详
    Tsinghua Sci. Tech., 2008, 4 (545-552):