Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition

被引:7
|
作者
Kim, Hwamin [1 ]
Park, Jeong-Sik [2 ]
机构
[1] Hankuk Univ Foreign Studies, Dept English Linguist, Seoul 02450, South Korea
[2] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 07期
基金
新加坡国家研究基金会;
关键词
language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network; SPEAKER;
D O I
10.3390/app10072225
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application This research can be applied for a multi-lingual automatic speech recognition system that handles input speech with two or more languages. Such a system requires the rapid identification of a language from input speech to transmit the speech to a recognition server targeting the language. Abstract The conventional speech recognition systems can handle the input speech of a specific single language. To realize multi-lingual speech recognition, a language should be firstly identified from input speech. This study proposes an efficient Language IDentification (LID) approach for the multi-lingual system. The standard LID tasks depend on common acoustic features used in speech recognition. However, the features may convey insufficient language-specific information, as they aim to discriminate the general tendency of phonemic information. This study investigates another type of feature characterizing language-specific properties, considering computation complexity. We focus on speech rhythm features providing the prosodic characteristics of speech signals. The rhythm features represent the tendency of consonants and vowels of languages, and therefore, classifying them from speech signals is necessary. For the rapid classification, we employ Gaussian Mixture Model (GMM)-based learning in which two GMMs corresponding to consonants and vowels are firstly trained and used for classifying them. By using the classification results, we estimate the tendency of two phonemic groups such as the duration of consonantal and vocalic intervals and calculate rhythm metrics called R-vector. In experiments on several speech corpora, the automatically extracted R-vector provided similar language tendencies to the conventional studies on linguistics. In addition, the proposed R-vector-based LID approach demonstrated superior or comparable LID performance to the conventional approaches in spite of low computation complexity.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Multi-lingual Transformer Training for Khmer Automatic Speech Recognition
    Soky, Kak
    Li, Sheng
    Kawahara, Tatsuya
    Seng, Sopheap
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1893 - 1896
  • [2] Parliament Archives Used for Automatic Training of Multi-lingual Automatic Speech Recognition Systems
    Nouza, Jan
    Safarik, Radek
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 174 - 182
  • [3] Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots
    Andriella, Antonio
    Ros, Raquel
    Ellinson, Yoav
    Gannot, Sharon
    Lemaignan, Severin
    [J]. PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024, 2024, : 865 - 869
  • [4] An automatic machine translation system for multi-lingual speech to Indian sign language
    Dhanjal, Amandeep Singh
    Singh, Williamjeet
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 4283 - 4321
  • [5] An automatic machine translation system for multi-lingual speech to Indian sign language
    Amandeep Singh Dhanjal
    Williamjeet Singh
    [J]. Multimedia Tools and Applications, 2022, 81 : 4283 - 4321
  • [6] Automatic segmentation and labelling of multi-lingual speech data
    Vorstermans, A
    Martens, JP
    VanCoile, B
    [J]. SPEECH COMMUNICATION, 1996, 19 (04) : 271 - 293
  • [7] SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION
    Scheidwasser-Clow, Neil
    Kegler, Mikolaj
    Beckmann, Pierre
    Cernak, Milos
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7697 - 7701
  • [8] A multi-lingual speech recognition system using a neural network approach
    Chen, OTC
    Chen, CY
    Chang, HT
    Hsu, FR
    Yang, HL
    Lee, YG
    [J]. ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1576 - 1581
  • [9] Automatic learning of numeral grammars for multi-lingual speech synthesizers
    Flach, G
    Holzapfel, M
    Just, C
    Wachtler, A
    Wolff, M
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1291 - 1294
  • [10] Cross corpus multi-lingual speech emotion recognition using ensemble learning
    Zehra, Wisha
    Javed, Abdul Rehman
    Jalil, Zunera
    Khan, Habib Ullah
    Gadekallu, Thippa Reddy
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1845 - 1854