Multi-lingual Transformer Training for Khmer Automatic Speech Recognition

被引:0
|
作者
Soky, Kak [1 ,4 ,5 ]
Li, Sheng [2 ]
Kawahara, Tatsuya [3 ]
Seng, Sopheap [1 ]
机构
[1] Natl Inst Posts Telecoms & ICT NIPTICT, Phnom Penh, Cambodia
[2] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
[3] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[4] NIPTICT, Phnom Penh, Cambodia
[5] Minist Educ Youth & Sports MoEYS, Phnom Penh, Cambodia
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Currently, there are three challenges for constructing reliable ASR systems for the Khmer language: (1) the lack of language resources (text and speech corpora) in digital form, (2) the writing system without explicit word boundary, and (3) the pronunciation model is not well studied. In this paper, to avoid the extensive work on selecting proper acoustic units (e.g., phones, syllables) and preparing the frame-level labels on the traditional DNN-HMM framework, we directly use words or characters as the label using state-of-the-art transformer-based end-to-end model. Moreover, we use the multi-lingual training framework to tackle the low-resource data problem. All experiments are performed on the Basic Expressions Travel Corpus (BTEC) datasets. The experiments show that the proposed multi-lingual transformer-based end-to-end model can achieve significant improvement compared to the DNN-HMM baseline model(1).
引用
收藏
页码:1893 / 1896
页数:4
相关论文
共 50 条
  • [1] Parliament Archives Used for Automatic Training of Multi-lingual Automatic Speech Recognition Systems
    Nouza, Jan
    Safarik, Radek
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 174 - 182
  • [2] Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition
    Kim, Hwamin
    Park, Jeong-Sik
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (07):
  • [3] Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots
    Andriella, Antonio
    Ros, Raquel
    Ellinson, Yoav
    Gannot, Sharon
    Lemaignan, Severin
    [J]. PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024, 2024, : 865 - 869
  • [4] Automatic Multi-lingual Script Recognition Application
    Abu-Ain, Waleed Abdel Karim
    Abdullah, Siti Norul Huda Sheikh
    Omar, Khairuddin
    Abd Rahman, Siti Zaharah
    [J]. GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2018, 18 (03): : 203 - 221
  • [5] Automatic segmentation and labelling of multi-lingual speech data
    Vorstermans, A
    Martens, JP
    VanCoile, B
    [J]. SPEECH COMMUNICATION, 1996, 19 (04) : 271 - 293
  • [6] SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION
    Scheidwasser-Clow, Neil
    Kegler, Mikolaj
    Beckmann, Pierre
    Cernak, Milos
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7697 - 7701
  • [7] Automatic learning of numeral grammars for multi-lingual speech synthesizers
    Flach, G
    Holzapfel, M
    Just, C
    Wachtler, A
    Wolff, M
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1291 - 1294
  • [8] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [9] SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION
    Dalmia, Siddharth
    Sanabria, Ramon
    Metze, Florian
    Black, Alan W.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4909 - 4913
  • [10] Multi-lingual interoperability in speech technology
    Steeneken, HJM
    [J]. SPEECH COMMUNICATION, 2001, 35 (1-2) : 1 - 3